Comparative genomics of Leuconostoc lactis strains isolated from human gastrointestinal system and fermented foods microbiomes

Leuconostoc lactis forms a crucial member of the genus Leuconostoc and has been widely used in the fermentation industry to convert raw material into acidified and flavored products in dairy and plant-based food systems. Since the ecological niches that strains of Ln. lactis being isolated from were truly diverse such as the human gut, dairy, and plant environments, comparative genome analysis studies are needed to better understand the strain differences from a metabolic adaptation point of view across diverse sources of origin. We compared eight Ln. lactis strains of 1.2.28, aa_0143, BIOML-A1, CBA3625, LN19, LN24, WIKIM21, and WiKim40 using bioinformatics to elucidate genomic level characteristics of each strain for better utilization of this species in a broad range of applications in food industry. Phylogenomic analysis of twenty-nine Ln. lactis strains resulted in nine clades. Whole-genome sequence analysis was performed on eight Ln. lactis strains representing human gastrointestinal tract and fermented foods microbiomes. The findings of the present study are based on comparative genome analysis against the reference Ln. lactis CBA3625 genome. Overall, a ~ 41% of all CDS were conserved between all strains. When the coding sequences were assigned to a function, mobile genetic elements, mainly insertion sequences were carried by all eight strains. All strains except LN24 and WiKim40 harbor at least one intact putative prophage region, and two of the strains contained CRISPR-Cas system. All strains encoded Lactococcin 972 bacteriocin biosynthesis gene clusters except for CBA3625. The findings in the present study put forth new perspectives on genomics of Ln. lactis via complete genome sequence based comparative analysis and further determination of genomic characteristics. The outcomes of this work could potentially pave the way for developing elements for future strain engineering applications.

Ln. lactis is a gram ( +), catalase (-), cocci, facultative anaerobic, heterofermentative, non-motile, non-spore forming LAB carrying intrinsic vancomycin resistance [5,6]. Certain Ln. lactis strains are able to produce buttery flavor metabolites for example diacetyl and acetoin at low pH. Thus, they could be utilized in fermented dairy foods [7,8]. Moreover, some Leuconostoc strains could convert carbohydrates such as sucrose to dextran exopolysaccharide [9]. Due to the heterofermentative lifestyle of Ln. lactis, it produces equimolar of lactate, ethanol, and carbon dioxide upon fermenting a mole hexose sugar (glucose and galactose), in the absence of an external electron acceptor, through pentose phosphate pathway (PPP) also known as 6-phosphogluconate/phosphoketolase pathway [10]. However, when an external electron acceptor such as acetaldehyde or pyruvate is available in the microenvironment, this organism could convert sugars primarily into lactate, acetate and CO 2 to maintain redox balance by reoxidizing NADH to NAD + which was reduced to NADH from NAD + in the upper half of PPP. This conversion into acetate produces one additional ATP thus it is more productive for the cell compared to ethanol route also called salvage shunt [5,11].
The LAB is reported to show high adaptation to specific microbiological niches and carry smaller genomes as opposed to other bacteria because of reducing genome which is the consequence of their effort to maintain only the required number of crucial genes necessary for micro-niche specific survivability. Even though Ln. lactis genome is proportionally small; it has to maintain the capability of rapid and continuous evolution with its essential ecosystem via horizontal gene transfer of plasmids or transduction by phage infectivity [12][13][14]. Moreover, in order to maintain its viability and grow in a changing and highly specific ecosystem, LAB has to balance the maintenance of a strong immune system against bacteriophages, transmissible DNA elements, exogenous plasmids or transposases [14][15][16].
Several Leuconostoc species such as carnosum and mesenteroides have been evaluated by comparative genomic analysis [2,17]. Although isolation source of Ln. lactis is reportedly diverse, metabolic potentials of Ln. lactis strains have not been subjected to extensive genomic research. Therefore, information on the species population dynamics and genomic diversity in various ecological systems such as kimchi, fermented cucumber brine, human gut or dairy is scarce if available at all [4]. To our knowledge, the present work is the first in-depth comparative study of Ln. lactis genomics and diversity in the human gastrointestinal tract and fermented foods microbiomes.

General genome features
Whole-genome sequence statistics of thirty-three Ln. lactis strains extracted from NCBI Genbank [18] are shown in Table 1.
The Leuconstoc lactis strains studied in the present study were comparatively evaluated using genomic analysis. Twenty-nine strains including the type-strain CBA3625 (Table 1) were chosen for comparative genomic analysis based on phosphoglucomutase gene (Fig. 1). A phylogenetic analysis of 29 strains carrying complete phosphoglucomutase gene was performed based on the nucleotide sequence alignment (Fig. 1). Four strains were eliminated because of truncated or absence of gene of interest. The phylogenetic tree revealed the formation of nine distinct clades. Human gastrointestinal isolates of BIOML-A1 and aa_0143 share the first clade, whereas dairy isolates of LN19, LN24, and kimchi isolates of WIKIM21, WiKim40, and CBA3622 share the second clade. The third clade was composed of green onion and kimchi isolates of SBC001 and CCK940, respectively. NBRC 12455, MSK.22.141, and MSK.22.137 share the same clade however, the isolation sources were not found. Stool isolate of 1001262B_160229_ C9, cucumber fermentation brine isolate of 1.2.28, kimchi isolate of KACC 91922, and UBA6751 (isolation source is not available) are located on the fourth clade to seventh clade, respectively. The isolation source of the last two clade members was not available except CBA3625, which was isolated from kimchi ( Fig. 1). Interestingly, the kimchi isolate of Ln. lactis CBA3625 was part of the eighth clade containing the strains of UBA5570 and UBA5566 isolated from wood and metal, respectively.

Comparative genomics of Ln. lactis
Next, we selected eight strains to conduct whole-genome nucleotide sequence comparisons. The genomes were chosen for further analysis were: BIOML-A1 (fecal sample), aa_0143 (stool), LN24 (dairy), LN19 (dairy), WiKim40 (kimchi), WIKIM21 (kimchi), 1.2.28 (cucumber fermentation brine), and CBA3625 (kimchi). Genomes of these strains were picked as a representative set of phylogenies shown in Fig. 1 and are highlighted in red. These strains were isolated from either fermented foods or the human gastrointestinal tract and range in size from 1.71 Mb to 1.79 Mb. The GC-content of each individual strain ranges between 42.9% and 43.4%. The OrthoANI and 16S rDNA sequence-based phylogenetic trees are shown in Fig. 2. The whole-genome analysis of eight strains was carried out using BRIG (Fig. 3), whole-genome sequence-based phylogenetic tree ( Figure  S1) and progressive Mauve ( Figure S2).
Average OrthoANI nucleotide sequence based phylogenetic tree generated four clades. The member of the clade-one consists of WIKIM21 only. LN24 and LN19 comprised the second clade. WiKim 40, BIOML-A1, and aa_0143 form the third clade. The last clade members were CBA3625 and 1.2.28 ( Fig. 2A). The phylogenetic tree of all strains clustered based on 16S rDNA shows two distinct clusters (Fig. 2B). Dairy originated Ln. lactis LN19 and LN24 form a separate clade from the remaining strains located on the second cluster from bottom to up (Fig. 2B).
Notably, 1.2.28, WiKim40, and BIOML-A1 share the highest sequence identity against the reference genome CBA3625 ( Fig. 2A). Similar results were also seen in the whole genome sequence based phylogenetic tree that the closest neighbors to CBA3625 were 1.2.28, WiKim40, and BIOML-A1 ( Figure S1). Figure 3 shows whole-genome based BLAST comparison of all strains against reference strain CBA3625. BRIG image shows alignment of eight Ln. lactis strains and their GC content and GC skews. Four regions lacking significant coverage were identified as putative prophages. The first two pronounced gaps on the genome alignment identified as putative prophage I and III were between 83.8 Kb -104. 5 Table 2).
For the characterization of genomic conservation between all isolates related to pan-and core genomes, overall coding potential (i.e. pangenome) was determined. It was observed that 40.7% of entire genes are conserved within 95% BLASTP identity (Fig. 4A). Of the 2994 total CDS, 1217 were shared by entire eight strains, which represent the core genome. The accessory genome also called the non-core genome, contained 1777 total CDS, perhaps determining fundamental differences of phenotypic traits across different strains [19].
Interestingly, a considerable number of sequences without function prediction (hypothetical genes) was found across all Ln. lactis strains ranging from 34 to 39% (37% on average). These Ln. lactis genomes are potential candidates for further functional annotation studies.
Four distinct clusters emerged after clustering by gene absence/presence matrix (Fig. 5). Cluster 1 only consists of aa_0143 showing the highest percent identity, with cluster 2 composed of LN19 and LN24. Cluster 3 consists of only WIKIM21 and reveals the highest percent identity with BIOML-A1 and WiKim40. There are two subclusters within the last cluster which contains 1.2.28 and CBA3625, and WiKim40 and BIOML-A1, respectively. Figure 6 shows the upSet plot of the number of shared orthogroups of each strain and the number of shared orthogroups among the strains with bar charts. The number of shared orthogroups in all strains was 1369.    [20] language and ggplot2 [21] package was used to plot the graphics Fig. 5 Phylogenetic tree based on gene absence-presence and gene cluster matrix comparing the similarity between putative coding sequences. R programming language was used to create the heatmap [20] LN19 and LN24 have the highest number of shared orthogroups among all strains tested. Next, aa_0143 and BIOML-A1 share the 46 orthogroups. aa_0143, BIOML-A1, and WiKim40 share 43 orthogroups.

Functional characteristics of Ln. lactis
Heatmap representation of CAZymes revealed five distinct clades. The amount of GH found was similar within the first and fifth clades (from bottom to up). The concentration of the GT was also found to be similar across the first and fifth clades. However, the highest number of GT family CAZymes does exist in clades three and four. All strains carried a similar amount of CE, except CBA3625, which had the largest number of CE family CAZymes (Fig. 7). The aa_0143 and BIOML-A1 are very similar in the number of enzymes they carry (Fig. 7). Interestingly, WIKIM21 shared the same clade with LN19 and LN24. The core and pangenomes were annotated using Prokka and assigned to functional categories in KAAS. As expected, the largest pangenome categories include CDS with functions associated with carbohydrate metabolism, amino acid metabolism, membrane transport, translation, and vitamins and cofactors metabolism. Functional genome groups, including the lowest number of CDS fall into organismal systems. The Fig. 6 The upSet plot shows the number of orthogroups of each strain and the number of shared orthogroups among the strains with bar charts. UpSetR [22]package in R programming language [20] was used to draw the figure  [20] was used to draw the heatmap highest number of genes accumulated in carbohydrate metabolism were amino-and nucleotide sugar, pyruvate, glycolysis, starch and sucrose metabolism. Major functional genes associated with lipid metabolism are pertained to fatty acid biosynthesis, glycerophospholipid, and glycerolipid metabolism. Amino acid metabolism mainly consists of cysteine and methionine, alanine, aspartate and glutamate metabolism, and phenylalanine, tyrosine and tryptophan biosynthesis. The highest standard deviation bars were achieved in histidine metabolism, and phenylalanine, tyrosine and tryptophan biosynthesis. The lowest number of genes in amino acid metabolism relates to tryptophan metabolism and lysine degradation (Fig. 8).

Mobile genetic elements
Spacers and repeats from entire CRISPR loci were identified using CRISPRviz. Among all strains screened, only aa_0143 and BIOML-A1 harbor a single spacer. The spacer alignment revealed no identical spacer sequences across two strains showing a robust confirmation of evolutionary heterology (Fig. 9B). A similar heterology was also seen in repeat sequence alignments of BIOML-A1 and aa_0143 (Fig. 9A). A total of three plasmids were discovered in BIOML-A1, LN19, and LN24, with the former harboring repUS2 and the latter two strain containing rep31 ( Table 3). The size of plasmids ranges from 0.66 kb to 1.15 kb. The minimum percent identity of predicted plasmids is 98.8%.

Bacteriocins
Analysis of Ln. lactis genomes with BAGEL4 showed a single type of bacteriocin "Lactoccoccin 972" exists in all genomes except CBA3625, which indicates potential antimicrobial characteristics of Ln. lactis strains. All results taken from BAGEL4 are checked with NCBI protein BLAST to validate bacteriocins.  [20] and ggplot2 [21] package were used to create the images based on KAAS-KEGG number of functional categories Weblogo results show that amino acid sequences of Lactococcin 972 discovered in the Ln. lactis genomes in the present study were similar. A strong MNKFKKK motif is identified in N-terminus of Lactococcin 972 (Fig. 10).

Discussion
In the present study, we genomically evaluated the Ln. lactis species and focused on eight strains representing the human gastrointestinal tract and fermented foods microbiomes. GC content is typical for low GC LAB. The proportion of hypothetical genes indicates that there is still more to uncover about Leuconostoc lactis. After extracting the genome of Ln. lactis, we conducted a global phylogeny of twenty-nine genomes (Fig. 1). This analysis predicted a remarkable diversity between Ln. lactis strains. Nine distinct clades were determined.
Interestingly, WiKim40 was isolated from kimchi, and its clade members were isolated from human feces ( Figure S1). This perhaps indicates that Ln. lactis enters the gut microbiome through food sources. The genome analysis clearly segregated leuconostocs by species, subspecies, and allowed intra-species and intra-strain differentiations [4]. Generally, dairy isolates of LN19 and LN24, kimchi isolates of CBA3625, WIKIM21, WiKim40, human gastrointestinal isolates of aa_0143 and BIOML-A1, and cucumber fermentation brine isolate of 1.2.28 exist in closely related clades (Fig. 2). Only dairy-associated strains lack the arabinose metabolism genes such as araA, while the rest of the strains harbored that gene, which perhaps relates to the fact that no arabinose sugar exists in dairy environments. Apart from aa_0143, BIOML-A1, LN19, and LN24 remaining four strains were isolated from fermented plant materials where arabinose sugar is part of the composition. From an evolutionary perspective, this shows that repetitive subculturing of LN19 and LN24 in dairy caused the gene loss or gene decay of araA due to lack of this sugar in the milk   (Fig. 11). Similar results were also reported for Lactobacillus (L.) casei supragenome that the strain isolated from milk vs silage material had a different carbohydrate fermentation profile [23]. For example, the dairy strain could utilize lactose but not inulin, whereas silage isolate is lactose negative but inulin positive. We also found that bglF gene encoding glucose/bglucoside family PTS transporter EIICBA did exist in all strains except cucumber fermentation brine isolate of 1.2.28. However, the crr gene encoding glucose/bglucoside family PTS transporter EIIA did exist in all strains except for WIKIM21, WiKim40, and 1.2.28. The celC gene encoding cellobiose-diacetyl chitobiose family PTS transporter EIIA was not found in any strains tested (Fig. 11). In a study comparing 17 Ln. carnosum strains isolated from meat reported the presence of celA, celB, and celC genes [17], which implies intra-species diversity within Leuconostoc genus. Obst et al. (1995) [24] reported that Leuconostoc strains isolated from dairy are reported to acquire plasmid-encoded LacLM through horizontal gene transfer to adapt milk microenvironment. Interestingly, all eight strains tested in the present work were found to be carrying lacS and lacZ genes regardless of their isolation source. Only dairy-associated strains of LN19 and LN24 harbored cit operon composed of citC, citD, citE, citF, and citG (Fig. 11). The lack of citrate uptake and utilization related genes in non-dairy associated Ln. lactis strains show evidence of prolonged degenerative evolution, perhaps due to a long period of proliferation in non-dairy niches [4] where no citrate exists.
Choline transport was predicted to be existing only in LN19 and LN24. These strains perhaps utilize choline, which is available in a wide range of milk products [25], to combat osmotic stress conditions such as high saltin-moisture content of cheese. However, all strains were predicted to carry betaine transport genes that are functional in osmoprotectant activity [26].
The absence of genes encoding antibiotic resistance perhaps relates to the specificity of Ln. lactis for dairy and plant-based fermentation matrices secluded this species under limited selective pressure in such a microenvironment as antibiotic usage in plant, and dairy production is restricted. Safety of Ln. lactis is a critical phenomenon given that a considerable number of bacteria belong to this species which are ingested as foods for example cheese and kimchi. Another critical safety factor is biogenic amine production by decarboxylase genes which were also not found in dairy-associated strains where health concern is more pronounced than plant materials [17].
Ln. lactis shows evidence of prolonged evolutionary degeneration, perhaps due to long and repetitive periods of proliferation in milk and fermented plant materials such as kimchi and cucumber fermentation. Dairy-associated Ln. lactis strains appeared to be evolved alongside L. helveticus and L. sanfranciscensis. IS3 family showing significant sequence alignment in kimchi, fecal material, dairy material, and cucumber fermentation brine isolates indicating this IS element belonged to Weissella cibaria and was likely imported via horizontal gene transfer (Table S1). Dairy isolates perhaps evolved with L. helveticus because it is heavily utilized as adjunct and starter culture in the dairy industry for flavor and acid development in cheese [27]. LN19 and LN24 were predicted to carry the IS30 family proposing this IS element was received from L. helveticus (Table S1).
Interestingly, both WIKIM21 and WiKim40 contained IS elements predicted to be originating from L. helveticus which indicates high adaptability to grow in various micro-niches due to its capability to ferment a broad range of carbohydrates and it was also isolated from plant materials [28].
We did not come across any study describing bacteriocin biosynthesis in Ln. lactis strains. In the present study, Lactococcin 972, a homodimeric bacteriocin that targets lactoccal strains, was found in seven Ln. lactis strains. It was isolated from Lactococcus lactis subsp. lactis IPLA 972 in 1996, and unlike other bacteriocins, Lactococcin 972 does not primarily target cell membrane [29]. The bacteriocin synthesis potential of Ln. lactis is reported for the first time in the present study. Therefore, the screening for unique antimicrobials needs further studies due to diverse microbial ecosystems occupied by Ln. lactis strains and abundance of CDS without any function assigned.
It was reported that Lactococcin 972 shows a narrow and specific antimicrobial spectrum similar to Lactococcin Q, a dipeptide bacteriocin biosynthesized by Lactococcus lactis QU 4, which possesses antagonistic activity only against Lactococcus lactis strains [30]. This might be a competitive inhibition strategy that Ln. lactis developed to perhaps inhibit Lactococcus lactis where they usually coexist together, especially in dairy applications where Lactococcus lactis starter cultures are heavily utilized [31]. This could be explained by competitive exclusion, where two strains competing for the same nutrient cannot stably coexist; thus, a competitive strain always dominates its competitor and forces evolutionary modification and shifts to another niche or extinction [32].
In all strains screened, CRISPR-Cas system was found in aa_0143 and BIOML-A1, which have human fecal material origin. Cheese dead vats (i.e. slow or no milk acidification) cause huge economical loss for dairy industry due to bacteriophage infection of starter cultures such as Lactococcus lactis, Streptococcus thermophilus, and Ln. lactis [33]. The knowledge on Ln. lactis' CRISPR-Cas could be further explored in fermented dairy foods biotechnology to protect and reduce bacteriophage infection of Ln. lactis dairy starter cultures (LN19 and LN24) for preventing economic loss in industry and conferring robust bioprocesses. The CRISPR-Cas in Ln. lactis strains described in the present study should further encounter functional assessment for investigating their utilization in microbial engineering against bacteriophage resistance to confer phage immunity to starter cultures.

Conclusion
This study aimed to boost the available fundamental knowledge on Leuconostoc lactis, a microorganism that plays an important role in industrial food fermentations. Global phylogeny on twenty-nine Ln. lactis strains revealed a great deal of diversity. A comparative wholegenome sequence analysis was performed on eight strains representing the human gastrointestinal tract and fermented foods. Comparative genome analysis showed all strains possess mobile genomic elements, namely insertion sequences (IS). CRISPR-Cas system was discovered in each of aa_0143 and BIOML-A1. All strains except CBA3625, LN 24, and WiKim40 harbor at least one intact putative prophage region. Apart from CBA3625, all strains encode genes functional for putative Lactococcin 972 biosynthesis. Metabolic differences according to strain isolation source were found between dairy and non-dairy (plant material) associated strains. For instance, plant-associated strains could utilize plantbased sugar arabinose, whereas dairy strains could not. However, dairy-associated strains were able to metabolize citrate though plant isolates could not, which perhaps