Identification of candidate genes for milk production traits by RNA sequencing on bovine liver at different lactation stages

Background RNA-sequencing was performed to explore the bovine liver transcriptomes of Holstein cows to detect potential functional genes related to lactation and milk composition traits in dairy cattle. The bovine transcriptomes of the nine liver samples from three Holstein cows during dry period (50-d prepartum), early lactation (10-d postpartum), and peak of lactation (60-d postpartum) were sequenced using the Illumina HiSeq 2500 platform. Results A total of 204, 147 and 81 differentially expressed genes (DEGs, p < 0.05, false discovery rate q < 0.05) were detected in early lactation vs. dry period, peak of lactation vs. dry period, and peak of lactation vs. early lactation comparison groups, respectively. Gene ontology and KEGG pathway analysis showed that these DEGs were significantly enriched in specific biological processes related to metabolic and biosynthetic and signaling pathways of PPAR, AMPK and p53 (p < 0.05). Ten genes were identified as promising candidates affecting milk yield, milk protein and fat traits in dairy cattle by using an integrated analysis of differential gene expression, previously reported quantitative trait loci (QTL), data from genome-wide association studies (GWAS), and biological function information. These genes were APOC2, PPP1R3B, PKLR, ODC1, DUSP1, LMNA, GALE, ANGPTL4, LPIN1 and CDKN1A. Conclusion This study explored the complexity of the liver transcriptome across three lactation periods in dairy cattle by performing RNA sequencing. Integrated analysis of DEGs and reported QTL and GWAS data allowed us to find ten key candidate genes influencing milk production traits.


Background
In dairy cattle, milk yield and compositions are the most important economical traits that are typical quantitative characteristics controlled by multiple QTLs and polygenes simultaneously. Detection of key functional genes or causal variations on milk production traits could provide valuable molecular information for marker-assisted selection or genomic selection in dairy cattle thereby greatly shortening generation interval and increase the rate of genetic gain through pre-selecting young candidate bulls prior to progeny testing. In the past several decades, extensive quantitative trait locus (QTL) mapping, candidate genes analysis, and genome-wide association study (GWAS) have been implemented to identify QTLs, genes and mutations with large effects on the milk traits of dairy cows [1][2][3][4]. As of December 29, 2019, a large number of QTLs and genetic associations have been detected with 5093, 19,782, and 22,418 loci for milk yield, milk protein and milk fat, respectively (http://www.animalgenome.org/cgi-bin/QTLdb/). However, only a few studies, including DGAT1 (Diacylglycerol O-Acyltransferase 1), GHR (Growth Hormone Receptor), and ABCG2 (ATP Binding Cassette Subfamily G Member 2) gene have been confirmed as major genes for milk production traits in dairy cattle [2,5,6].
Beyond marker-QTL linkage analysis (LA) and/or linkage disequilibrium (LD) and GWAS, with the dramatically cost reductions and rapid development of next generation sequencing, RNA-sequencing (RNA-Seq) has become a commonly used approach to identify candidate genes for complex traits in human and domestic animals in recent years. Many bovine studies utilizing RNA-Seq have been conducted [7], including mammary tissue [8], embryos [9], leukocytes [10], different disease conditions [11], or factors with various nutrition traits [12]. Our previous RNA-Seq studies with mammary gland tissues from lactating Holstein cows with extremely high and low milk protein and fat percentages and from cows in non-lactation and peak of lactation identified 17 differentially expressed genes (DEGs) as promising candidates for milk composition traits [8,13]. However, only a limited number of studies on transcriptional profiles in bovine liver have been reported until now [12,14].
Actually, besides mammary gland epithelium, liver also plays critical roles in milk synthesis during lactation as the most important organ where the metabolism of gluconeogenesis, lipid, amino acids and other substances takes place in dairy cows [15][16][17]. Gluconeogenesis in liver is vital to meet glucose requirements for dairy cows in perinatal period. Dorland et al. (2009) presented the liver, as a crucial role in considerable metabolic adaptation, supported pregnancy and lactation through coordination and interconversion of nutrients [17], especially during the period of transition from late gestation to early lactation. A few genes in bovine liver involved in glucose and lipid metabolism had been detected via bovine gene microarrays [18]. Smith et al. (1998) reported that cholesterol content was increased in liver to meet lipoproteins synthesis and secretion, and then provided the mammary gland with cholesterol and triglycerides after parturition [19]. The mRNA expression of pyruvate carboxylase (PC) increased in medium and high liver fat concentration (LFC) groups than that in the low-LFC group, suggesting expressions of more genes related to gluconeogenesis than genes for lipid metabolism in early lactation [20]. Bu et al. (2017) compared the gene expression profiles between liver and mammary tissues during lactation in cows and found that liver expressed a larger number of metabolic genes, especially related to lipid, while mammary gland had more genes with regards to protein synthesis and secretion, proliferation/ differentiation [21]. Our previous study detected some differentially expressed mRNAs, miRNAs and lncRNAs among different lactation stages in liver thereby predicted the competing endogenous RNAs (ceRNA) regulatory networks in Holstein cow [22].
In view of the important roles of liver in metabolism, the objective of this study was to search for key functional genes for milk production traits by performing RNA sequencing of liver samples from dairy cows in non-lactation and lactation. Here, we utilized liver biopsy transcriptome data from three Holstein cows during the dry period, early lactation and peak of lactation obtained using RNA-Seq and identified ten promising candidate genes for milk yield and milk compositions (protein and fat) in Holstein. Our findings could provide new insights into elucidation of the genetic basis for milk traits and potential molecular information for genomic selection in dairy cattle breeding.

Animals and liver tissue sample collection
Three healthy Chinese Holstein cows with similar body weight, milk yield, and milk composition were selected from 1300 Chinese Holstein cows fed in Baoding Hongda Animal Husbandry Limited Company in Hebei Province (Baoding, China) [22]. Liver biopsies were collected on 50-d before parturition (dry period), 10-d after parturition (early lactation) and 60-d after parturition (peak of lactation) from each cow as described in detail in our previous study [22]. As a result, a total of nine samples were obtained. All protocols for collection of the tissues of experimental individuals were reviewed and approved by Animal Welfare Committee of Hebei Agricultural University. All animals care and treatment in compliance with the "Principles of Laboratory Animal Care (NIH Publication No. 86-23, revised 1985)". The cows after study were all alive and healthy and still produced milk.

RNA sequencing, read alignment and identification of differentially expressed genes
The total RNA was isolated from each live tissue sample with TRIzol reagent (Life Technologies, CA, USA) and purified with RNase-free DNase (TIANGEN, Beijing, China). RNA integrity number (RIN) was detected with RNA Nano 6000 Assay Kit on Bioanalyzer 2100 system (Agilent Technologies, CA, USA) and samples with RIN values of higher than 7.0 were used as input material for the RNA sample preparations. Subsequently, sequencing libraries were generated using NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina® (New England Biolabs, MA, USA) and sequenced on Illumina HiSeq 2500 platform. After removing adapters, deleting reads containing poly-N and low quality reads from the raw data, we aligned the paired-end reads of the clean data to the cattle reference genome assembly UMD 3.1.80. Detailed descriptions for sequencing and assembling were shown previously [22].
Gene expression level was calculated as fragments per kilo base pair (kb) of transcript per million mapped fragments (FPKM). DEGs among the different lactation stages were detected and quantified using Cuffdiff (http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/). FPKM value, fold changes (in log2 scale), p-values and q-values (false discovery rate: corrected p-values) of DEGs were reported in the output files from Cuffdiff, and q-value of < 0.05 was set as the threshold for significantly differential gene expression.

Functional enrichment analysis of differentially expressed genes
Ensembl Gene IDs were uploaded to the DAVID Functional Annotation Tool (https://david.ncifcrf.gov/home. jsp) and Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) were performed. P-value of 0.05 was set as the threshold for significantly enriched GO terms and pathways.
Integrated analysis of the differentially expressed genes and previously reported QTL and GWAS data To better identify candidate genes for milk traits, we further compared the physical positions on genome of DEGs with the previously reported QTLs that have been shown to be associated with five milk production traits, namely milk yield, protein yield, fat yield, protein percentage and fat percentage (Cattle QTLdb: http://www. animalgenome.org/cgi-bin/QTLdb/BT/index) and the significant SNPs for five milk traits identified by previous GWAS in dairy cattle [23]. Thereby, DEGs close to the QTL peak positions (less than 5 cM) and near significant SNPs (less than 5 Mb) were selected as candidate genes.

Overview of RNA sequencing
We sequenced the cDNA libraries of nine liver tissue samples from three Holstein cows with three samples from each period, i.e. dry period (50-d prepartum), early lactation (10-d postpartum) and peak of lactation (60-d postpartum) using HiSeq 2500. In total, we acquired 780.38 million clean reads, with 86.71 million for each sample on average (range from 78.62 to 97.78 million). The sequencing quality values of Q20 and Q30 were 93.13 and 87.48% respectively (Additional file 1) indicating a high quality of sequencing data. Alignment of the sequencing reads against the bovine genome UMD3.1.80 yielded 84.49% of uniquely aligned reads across the nine samples (Additional file 2) which were used for further analyses. Pearson correlation between the three samples in each stage was higher than 0.935 that was considered as a high correlation indicating the high similarity of biological replicates (Additional file 3).

Differentially expressed genes across three periods
With Cuffdiff software, a total of 204 (118 up-regulated and 86 down-regulated), 147 (106 up-regulated and 41 down-regulated) and 81 (57 up-regulated and 24 downregulated) differentially expressed genes (DEGs, p < 0.05, false discovery rate q < 0.05), which were ranked in the top half expressed genes, were detected in the early lactation vs. dry period, peak of lactation vs. dry period, and peak of lactation vs. early lactation, respectively (Additional file 4 and Additional file 5). Details of the top 10 DEGs in the three comparisons and their full name as well as q-value and fold change were described in Table 1.

Gene ontology enrichment and pathway analysis
To further know about the functional associations of the differentially expressed genes between lactation periods, we implemented gene ontology (GO) analysis with DAVID software (Additional file 6). Between early lactation and dry period, significantly enriched GO terms (p < 0.05) were mainly focused on metabolism-related functions, especially on carboxylic acid metabolic process, transport of lipid/ cholesterol/ sterol, regulation of cellular ketone metabolic process/ lipid biosynthetic process/ lipoprotein lipase activity, high-density lipoprotein (HDL) particle, lipoprotein particle, very low-density lipoprotein (VLDL) particle and small molecule/ carboxylic acid/ carbohydrate catabolic process. Of these, insulin-like growth factor binding and anion binding were the most significantly enriched GO molecular functions. Of note, the top 10 DEGs were involved in response to metabolic process and transport, cellular component organization localization.
Comparing the peak of lactation with dry period, the most significantly enriched GO categories (p < 0.05) were related to response to biotic stimulus/ other organism/ external biotic stimulus, immune effector process, HDL particle, plasma lipoprotein particle, lipoprotein particle, and molecular functions associated with doublestranded RNA binding, insulin-like growth factor binding, and growth factor binding. The top 10 DEGs were related to response to stimulus, metabolic process, biological regulation, cellular component organization or biogenesis, multi-organism process, and localization.
However, between peak of lactation and early lactation, the main significant GO categories (p < 0.05) were focused on triglyceride/ acylglycerol/ neutral lipid/ fatty acid/ monocarboxylic acid metabolic process, and molecular functions associated with fructose transmembrane transporter activity and double-stranded RNA binding. Among the TOP10 DEGs, cellular and metabolic processes, response to stimulus, biological regulation, cellular component organization or biogenesis, multi-organism process, and localization were enriched.
In addition, KEGG analysis significantly enriched 53, 39 and 46 pathways (Additional file 7), including PPAR and AMPK signaling pathways, cholesterol/ fatty acid metabolism and glycolysis/gluconeogenesis between early lactation vs. dry period, and PPAR, p53 and AMPK signaling pathways between peak of lactation and other two periods. Candidate genes identified by integrated analysis of RNA-Seq, reported QTL and GWAS data To identify candidate functional genes affecting milk production traits, integrated analysis of the RNA-Seq data in this study and the previously reported genetic data including QTL mapping and GWASs was performed. First, we compared the physical position of each DEG with the position of known QTLs that have been shown to be associated with milk yield, milk protein and milk fat traits in dairy cattle from the Cattle QTLdb database (http://www.animalgenome.org/cgi-bin/QTLdb/). Then, each DEG was compared with the significant SNPs for milk traits in Holstein identified in a GWA study by Cole et al. [23]. As a result, ten common differentially expressed genes that were simultaneously close to the known QTLs (less than 5 cM) and significant SNPs (less than 5 Mb) were identified (Tables 2 and 3). Thus, through combination of DEGs, QTL and GWAS data, and biological functions, such ten genes were suggested as promising candidates for milk production traits, i.e. APOC2 (Apolipoprotein C2), PPP1R3B (Protein phosphatase 1 regulatory subunit 3B), PKLR (Pyruvate kinase), ODC1 (Ornithine decarboxylase 1), DUSP1 (Dual specificity phosphatase 1) and LMNA (Lamin), GALE (UDP-galactose-4-epimerase), ANGPTL4 (Angiopoietin like 4), LPIN1 (Lipin 1), and CDKN1A (Cyclin dependent kinase inhibitor 1A).

Discussion
In this study, we obtained the liver transcriptome profiles in different lactation periods of Holstein cows using high-throughput RNA sequencing, and 432 differentially expressed genes (DEGs), which were ranked in the top half expressed genes, were detected among the dry period (−50 day), early lactation (+ 10 day), and peak of lactation (+ 60 days). The DEGs expressed in the bottom half level were eliminated to ensure the detection power. Rapaport et al. (2013) investigated the relationship between detection power of DEGs and sequence depth and number of replicates, and demonstrated that with most methods, over 90% of differently expressed genes at the top expression levels are detected with little as 2 replicates and 5% of the reads [24]. Trapnell et al. (2013) also reported that the detection rate of DEGs was similar for three or more replicates using Cuffdiff [25]. Of noted, the more biological replicates are taken, more detection power are improved.
Through functional enrichment, we found that the differentially expressed genes (DEGs) between milking and non-lactation status (early and peak of lactation vs    before calving) mainly participated in metabolisms of lipid, fatty acid, protein, carbohydrate and energy, and involved in MAPK, p53 and PPAR signaling pathways. Especially, lipid metabolism was significantly enriched such as fat digestion and absorption, fatty acid metabolism, bile secretion, endocytosis, biosynthesis of unsaturated fatty acids, endocytosis, fatty acid metabolism, AMPK, and PPAR signaling pathways. This is mostly likely due to milk lipids, proteins, lactose, saturated and unsaturated fatty acids that need to be synthesized during lactation. These results also suggested that metabolism and synthesis in liver provided nutrient substance support for the milking in mammary gland probably through the milking vein in dairy cows. Previous studies also revealed that the contents of triacylglycerol [26] and blood nonesterified fatty acids (NEFA) [27] were increased greatly during the perinatal period. Based on the integrated analysis of DEGs, QTLs, GWAS data and biological functions, a total of ten promising candidate genes were found. Among them, four genes like GALE, ANGPTL4, LPIN1, and CDKN1A have been reported to play roles in milk production [28]. Six novel candidate genes identified in this study included APOC2, PPP1R3B, PKLR, ODC1, DUSP1, and LMNA.
APOC2, encoded by the APOC2 gene, is the component of chylomicrons (CM), very low density lipoprotein (VLDLs), low density lipoprotein (LDLs) and high density lipoprotein (HDLs) in plasma. It plays an important role in lipoprotein metabolism as an activator of lipoprotein lipase. The deficiency of APOC2 could cause high circulating levels of triglycerides (TGs), while overexpression of APOC2 could inhibit lipoprotein lipase activity [29,30]. In addition, APOC2 was close to two known QTLs for milk protein yield and milk yield with distances of 1.9 to 2.3 cM and also near four significant SNPs for milk fat and protein traits detected by Cole et al.(2011) [23].
PPP1R3B and PKLR are related to carbohydrate metabolism. PPP1R3B activates glycogen synthesis and limits glycogen breakdown in the liver and skeletal muscle [31]. It was not only close to one reported QTL for milk fat percentage with distance of 0.5 cM, also near two significant SNPs for milk fat and protein identified by GWAS [23]. PKLR plays a key role in hepatic glycolysis [32]. It was close to three known QTLs (0.7 to 1.7 cM) for milk fat yield, milk protein percentage and milk fat percentage and also near one significant SNP for milk protein yield [23].
The ODC1 gene encodes ODC1 that is a key enzyme of polyamine biosynthesis and a main regulator of protein synthesis and lactogenesis. The transcription of ODC1 is stimulated by high nutritional levels and is elevated during periods of rapid mammary growth and differentiation [33]. ODC1 was very close to two known QTLs for milk fat yield with 0.1 cM and near 18 significant SNPs for milk fat and protein with distances 74Kb~4.53 Mb [23].
DUSP1 is involved in the epithelial-to-mesenchymal transition, regulation of breast cancer stem cells (CSCs) and signal transduction [34]. DUSP1 was not only close to the two known QTLs for milk fat yield and milk protein yield (2.7 to 4.6 cM) but also near five significant SNPs for milk fat and protein [23]. LMNA plays an important role in nuclear assembly, chromatin organization, nuclear membrane formation and telomere dynamics. The mutations in LMNA caused an imbalance between lipid oxidation and oxidative glucose metabolism in skeletal muscle metabolism [35]. It was near to the peak positions of three QTLs (0.5 to 2.9 cM) for milk fat yield, milk protein percentage and milk fat percentage and near one significant SNP for milk protein yield [23].

Conclusions
This study detected significantly expressed genes (DEGs) in liver among three lactation periods (dry period, early and peak of lactation) by performing RNA sequencing in Holstein cows. Integrated analysis of DEGs, previously reported QTL and GWAS data, and biological functions of genes identified 10 promising candidate genes for milk production traits, including APOC2, PPP1R3B, PKLR, ODC1, DUSP1, LMNA, GALE, ANGPTL4, LPIN1 and CDKN1A. Our findings provided a solid basis for further in-depth studies on how these genes regulate milk synthesis and molecular information for genomic selection in dairy cattle.
Ethics approval and consent to participate All protocols for collection of the tissues of experimental individuals and phenotypic observations were reviewed and approved by the Animal Welfare Committee of Hebei Agricultural University. Tissue samples were collected specifically for this study following standard procedures with the written consent from the Baoding Hongda Animal Husbandry Limited Company who owned the Chinese Holstein cows.

Consent for publication
Not applicable.

Competing interests
Dongxiao Sun is a member of the editorial board (Associate Editor) of this journal. Dongxiao Sun and all the other authors declare that they have no competing interests.
Author details 1