Genome-wide association studies for methane emission and ruminal volatile fatty acids using Holstein cattle sequence data

Background Methane emission by ruminants has contributed considerably to the global warming and understanding the genomic architecture of methane production may help livestock producers to reduce the methane emission from the livestock production system. The goal of our study was to identify genomic regions affecting the predicted methane emission (PME) from volatile fatty acids (VFAs) indicators and VFA traits using imputed whole-genome sequence data in Iranian Holstein cattle. Results Based on the significant-association threshold (p < 5 × 10− 8), 33 single nucleotide polymorphisms (SNPs) were detected for PME per kg milk (n = 2), PME per kg fat (n = 14), and valeric acid (n = 17). Besides, 69 genes were identified for valeric acid (n = 18), PME per kg milk (n = 4) and PME per kg fat (n = 47) that were located within 1 Mb of significant SNPs. Based on the gene ontology (GO) term analysis, six promising candidate genes were significantly clustered in organelle organization (GO:0004984, p = 3.9 × 10− 2) for valeric acid, and 17 candidate genes significantly clustered in olfactory receptors activity (GO:0004984, p = 4 × 10− 10) for PME traits. Annotation results revealed 31 quantitative trait loci (QTLs) for milk yield and its components, body weight, and residual feed intake within 1 Mb of significant SNPs. Conclusions Our results identified 33 SNPs associated with PME and valeric acid traits, as well as 17 olfactory receptors activity genes for PME traits related to feed intake and preference. Identified SNPs were close to 31 QTLs for milk yield and its components, body weight, and residual feed intake traits. In addition, these traits had high correlations with PME trait. Overall, our findings suggest that marker-assisted and genomic selection could be used to improve the difficult and expensive-to-measure phenotypes such as PME. Moreover, prediction of methane emission by VFA indicators could be useful for increasing the size of reference population required in genome-wide association studies and genomic selection.


Background
Methane is the second most abundant global anthropogenic greenhouse gas behind carbon dioxide [1]. The ruminant production system has considerable contribution to climate change by emitting methane, either directly from enteric fermentation or indirectly from feed production activities, deforestation, manure, etc. [2]. The majority of enteric methane (87%) is produced in the rumen [3] by the activity of methanogenic archaea under anaerobic conditions [2]. Traditionally, several strategies such as diet manipulations, chemo-genomics, chemical and biological feed additives and anti-methanogen vaccines, have been designed to mitigate methane emissions besides improving the efficiency of production system. However, a fundamental problem is that the ruminal microbiota is able to adapt rapidly to intervention strategies [4].
The amount of enteric-methane emissions from ruminants is affected by a variety of factors including feed intake and composition, fermentation process e.g. passage rate and rumen volume, physiological state of the animal, and variation among individual animals [4][5][6]. Reducing the methane produced by cattle through genetic selection is a highly desirable approach compared with other strategies (e.g. adding supplements such as nitrates to feed) because genetic improvement is heritable and cumulative across generations. De Haas et al. [7] reported that the estimated heritability of predicted methane emission (PME) per milk production and PME per milk production corrected for fat and protein content were 0.35 and 0.58, respectively. Moreover, genomewide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) associated with measured or predicted rate of methane emission in Holstein cattle [7] and Angus cattle [8]. Thus, genetic selection can be employed to reduce PME per unit of milk production. However, routine measurement of methane production is difficult and expensive in farm, and identified SNPs associated with methane production can only explain a small proportion of its genetic variation [9].
Nowadays, via the availability of sequence data that potentially consists of causative mutations, the prediction accuracy of genomic selection and resolution of GWAS can be improved [10]. Therefore, the goal of our study was to identify genomic regions affecting the predicted methane emission (PME) from volatile fatty acids (VFAs) indicators and VFA traits using imputed whole-genome sequence data in Iranian Holstein cattle.

Results
The descriptive statistics for studied traits are presented in Table 1. In the least squares analysis of the studied traits, the contemporary group effect was statistically significant (p < 0.05) for PME, butyric acid and isovaleric acid, and the effect of age was only significant for PME (p < 0.05).

Genome-wide association studies for volatile fatty acids
After removing the genotypes with low imputation accuracy (R 2 < 0.30) and other criteria (e.g. call rate < 95%, MAF < 0.05, and Chi-square < 10 − 6 of Hardy-Weinberg equilibrium), a total of 6,583,595 SNPs on 29 Bos taurus autosomes (BTAs) were remained. The number of quality-control passed imputed variants used for association analyses ranged from 125,756 on BTA25 to 443, 774 on BTA1.
Genome-wide association studies for predicted methane emission The GWAS results for PME, PME per kg milk, and PME per kg fat are presented in Manhattan plots in Fig. 2. According to suggestive regions (p < 10 − 5 ), we found 53 SNPs (BTA8, 10, 15, 18, 23, and 26) for PME; 41 SNPs (BTA15) for PME per kg milk; and 308 SNPs (BTA1, 2, 4, 5, 8, 10, 13, 14, 16, 19, 20, 21, 26, and 28) for PME per kg fat. Although, the association studies did not show any significant SNPs for PME trait, the chromosome-wide Bonferroni correction identified five significant SNPs on BTA15 for PME per kg milk, 20 significant SNPs on BTA4, 13, 19, and 21 for PME per kg fat. PME per animal that were not adjusted per unit of product showed a decreased variation compared to the adjusted PME per kg of milk or fat i.e. two animals with an equal rate of methane emission do not necessarily have an equal efficiency for methane production (methane emission per unit of product). As a result, the power of the association studies to capture quantitative trait loci (QTL) without considering the adjustment of PME per unit of product can be diminished. Using the recommended Bonferroni correction threshold (p < 5 × 10 − 8 ), we identified two SNPs on BTA15 for PME per kg milk, 14 SNPs on BTA4, 13, 19, and 28 for PME per kg fat (Table 2); and two SNPs on BTA4 and 19 passed the genome-wide Bonferroni correction for PME per kg fat ( Table 2).
The results of gene networks analyses for valeric acid and methane emission traits were shown in Figs. 3 and 4, respectively. Furthermore, the summary of significant SNPs (p < 5 × 10 − 8 ) associated with PME and valeric acid traits that are in close distance to reported QTLs is presented in Table 5.

Discussion
The results of GWAS showed some suggestive SNPs associated with VFA traits and some regions (e.g. BTA3) associated with both acetic and propionic acid traits. Alemu et al. [11] also reported a strong correlation (− 0.85) between acetic and propionic acid. Further, we only found two SNPs significantly associated with isovaleric acid (BTA9 and 28) and 29 SNPs with valeric acid (BTA5, 11, and 25) based on the chromosome-wide Bonferroni correction threshold. The isoacids (e.g. isobutyric, 2-methylbutyric, isovaleric acid, and straight-chain valeric acid) produced in the digestion process of ruminants are mainly generated from the degradation products of the amino acids valine, isoleucine, leucine, and proline [12]. Isoacids are either required by, or stimulate the growth of ruminal cellulolytic bacteria. Therefore, they play a critical role in microbial fermentation [12].
According to the previous studies, milk production in dairy cattle could increase by 5 to 10% with the addition of isoacids to the diet [13]. Therefore, the existence of significant association between host genome and both iso-valeric acid, and valeric acid traits can help dairy farmers to improve these acids availability in rumen for ruminal cellulolytic bacteria using genetic selection.
Given that animals in this study were sampled based on highest and lowest percentiles of milk production EBV with an equal number of animals, this strategy can enhance the ability of QTL detection especially in population with small sample size. However, it has been Fig. 2 Manhattan plot of the genome-wide p values of association for PME traits: a PME; b PME per kg milk; c PME per kg fat. The solid line represents p < 1 × 10 − 5 significance threshold reported that Bonferroni correction can increase the number of type II errors, particularly when the sample size is small [14]. Thus, chromosome-wide Bonferroni correction can be considered more accurate than genome-wide Bonferroni correction [15]. Since, the sample size in our GWAS was relatively small, the power of finding SNPs that are associated with PME traits was limited, and thus a few number of SNPs passed the genome-wide Bonferroni correction stringent p-value threshold.
Cattle have a considerable ability to produce highquality proteins from non-edible plant cell wall components for human consumption [16]. However, it is well established that gastrointestinal microbiota contributes to feed digestion and enteric methane emission in ruminants [17][18][19][20], as approximately 17% of global methane emissions generated through ruminants [8]. Methane emission is a difficult and expensive trait to measure and is not routinely measured in dairy cattle. Furthermore, this trait has not been directly considered in the selection index of dairy cattle. However, the detrimental environmental effects of cattle production can be reduced significantly by improving the efficiency of cattle in converting feed to milk and meat, rather than wasting energy as enteric methane. To improve this efficiency in dairy cattle, it is crucial to understand the sources of genetic variation in methane production among individual animals, and the genetic architecture of methane production. van Engelen et al. [21] reported that the heritability of predicted methane yields ranged from 0.12 to 0.44 in Dutch Holstein-Friesian cows, and suggested that PME based on milk fatty acids could be reduced using genetic selection programs. de Haas et al. [7] estimated the heritability of PME and PME per fat-and proteincorrected milk yield to be 0.35 and 0.58, respectively. However, they reported that seven SNPs associated with PME in Holstein cattle could explain only 0.2% of the total genetic variance [7]. In another GWAS, 3304 significant SNPs were identified for PME (p < 0.005) in which 630 of them were associated with weight-at-test and dry-matter intake [8]. It has been estimated that 19 to 23% of methane emissions per kg of milk can be reduced and converted to production when cows are selected based on milk fat plus protein (19%), or based on the average genetic merit of milk fat plus protein yield (23%) [22]. According to Yan et al. [23], an effective method for mitigating methane emissions in dairy cows is to select the animals based on increased milkproduction efficiency and increased energy-utilization performance.
Understanding the genetic variation and underlying genes associated with PME allows reduction of methane production in cattle through marker-assisted or genomic selection. It has been reported that genomic selection can reduce PME up to approximately 5% over 10 years [24]. The results of our study showed a large variation of  PME traits indicating that mitigation of methane emissions in dairy cattle might be possible using the VFA profile of the rumen as an indicator of methane production. Moreover, reduction in methane emissions can also improve feed-conversion efficiency in dairy cattle. In addition, unlike other strategies developed to reduce methane emissions in dairy cattle such as changing the dietary formulations, feed additives, and antimethanogen vaccines, genetic improvements have the benefits of being heritable, cumulative, and permanent.  Fig. 3 Gene networks analysis for valeric acid trait. Dark circles with and without slash represent candidate genes and associated genes, respectively. Arrows in pink, blue, red and bone color represent co-expression, pathway, physical interactions and shared protein domains, respectively Sixty-nine genes were identified for valeric acid (n = 18), PME per kg milk (n = 4) and PME per kg fat (n = 47) that were located within 1 Mb of significant SNPs. For valeric acid two of the most promising candidate genes (ARPC1A and TRRAP) were related to residual feed intake (RFI) [25,26]. It has been observed that low-RFI pigs had a tendency toward lower valeric acid (p = 0.07) and isovaleric acid (p = 0.09) concentrations [27]. Therefore, available valeric acid concentration might be influenced by these genes in cattle. The UniProt Knowledgebase (https://www.uniprot.org/uniprot/) was used to identify the molecular functions of ARPC1A and TRRAP genes. ARPC1A is a component of the Arp2/3 complex, involved in regulation of actin polymerization and together with an activating nucleation-promoting factor mediates the formation of branched actin networks. TRRAP is an adapter protein which can be found in different multiprotein chromatin complexes with histone acetyltransferase activity that gives a specific tag for epigenetic transcription activation. Further, TRRAP is required for mitotic checkpoint and normal cell cycle progression. Furthermore, 17 candidate genes significantly clustered in olfactory receptors activity (GO: 0004984, p = 4 × 10 − 10 ) for PME traits. Olfactory receptors genes influenced feeding behavior such as feed intake and preferences [28]. Furthermore, they are associated with RFI and modulated olfactory transduction pathways in pig [29]. Since there was a positive genetic correlation (ranging from 0.18 to 0.84) between RFI and PME in Holstein-Friesian cows [7], olfactory receptors genes might affect methane mission per animal. In addition, five candidate genes including CYP51A1 on BTA 4, PPP1R16B on BTA 13, and NTHL1, TSC2, and PKD1 on BTA 25 suggested by Pszczola et al. [30], involved in a number of metabolic processes that might be related to methane emission. PKD1 gene was associated with development of the digestive tract [30]. Based on the results of gene networks analyses, there were some contributions among candidate genes by co-expression, pathway, physical interactions and shared protein Fig. 4 Gene networks analysis for PME per kg milk and fat traits. Dark circles with and without slash represent candidate genes and associated genes, respectively. Arrows in pink, blue, red and bone color represent co-expression, pathway, physical interactions and shared protein domains, respectively domains for valeric acid and methane emission traits. Most of these contributions were related to physical interactions between genes that was indicative of proteinprotein interaction and if two genes showed the same protein-protein interaction, their products were linked together. Therefore, identified candidate genes in our study had a significant protein-protein interaction either together or with associated genes.
Thirty-one QTLs were identified for valeric acid (n = 10), PME per kg milk (n = 2) and PME per kg fat (n = 19) that were located within 1 Mb of significant SNPs. Given that methane emission is a complex trait influenced by many genes, and is not measured routinely in dairy cattle (unlike the traits of milk production and body weight), there are very limited number of QTLs associated with methane emission in the QTLdb website. Most of the QTLs that are in close distance to SNPs associated with PME traits have been found to influence milk production and its components, body weight, and RFI, rather than methane production. However, all SNPs associated with valeric acid in our study were close to the reported QTLs for milk production and its components, and RFI. The correlation coefficients between EBV for methane intensity with milk yield, fat yield, Table 5 QTLs located in close distance to the most significant single nucleotide polymorphisms (SNPs) associated with valeric acid and predicted methane emission (PME) per kg milk and fat traits protein yield, and somatic cell score were estimated as − 0.68, − 0.13, − 0.47, and 0.07, respectively [31]. Further, a positive genetic correlation between RFI and PME (ranging from 0.18 to 0.84) was reported in Holstein-Friesian cows [7]. Herd et al. [32] also reported a high correlation between daily methane production and yearling weight (0.85), and dry-matter intake (0.79) in beef cattle. These reports indicated that some common genes and QTLs were influencing traits related to methane emission and milk production that was confirmed by our study. However, as the cows were selected based on milk production EBVs in this study, the identified QTLs and genes around significant SNPs might be increasing milk or fat production and consequently lowering methane emission per milk or fat amount. Therefore, further studies are needed to validate the results of our GWAS and investigate the biological effects of the validated SNPs on milk production and methane emission.

Conclusions
Although 1045 SNPs passed the suggestive-association threshold (p < 10 − 5 ) in our GWAS, only 33 of them reached the significant-association threshold (p < 5 × 10 − 8 ) for PME per kg milk (2 SNPs); PME per kg fat (14 SNPs) and valeric acid (17 SNPs) traits. Furthermore, 17 and 40 candidate genes were discovered around the significant SNPs for valeric acid and PME (per kg milk and fat) traits. According to the GO term analysis, six promising candidate genes were clustered in organelle organization for valeric acid trait, and 17 candidate genes were clustered in olfactory receptors activity for PME trait. Moreover, the SNPs that were found in our GWAS were close to QTLs for milk yield and its components, body weight, and RFI. High correlations of these traits with PME traits and overlap of our identified genomics regions with previously reported QTLs, as well as the existence of olfactory receptors activity genes for PME traits associated with feed intake and preferences could confirm that these SNPs can be good candidates for methane emission. Overall, our findings suggest that marker-assisted and genomic selection could be used to improve the difficult and expensive-to-measure phenotypes such as PME. Moreover, prediction of methane emission by VFA indicators could be useful for increasing the size of the reference population required in GWAS and genomic selection.

Animals and data
Animals were selected using two-tailed strategy proposed by Jiménez-Montero et al. [33] based on the estimated breeding values (EBVs) for milk yield because there was a high negative genetic correlation (− 0.68) between milk yield EBVs and methane intensity e.g. [31], which was almost confirmed by our data set in which the phenotypic correlation between milk yield EBVs and predicted methane emission was estimated to be − 0.47. Hair and rumen digesta samples were collected from 150 Iranian Holstein cattle in a breeding population (Ferdous Pars Dairy Farm, Isfahan, Iran) in May 2016. Animals were not euthanized after the study because of the safe methods used for sampling. Estimation of breeding values were performed by National Animal Breeding Centre of Iran (Karaj, Iran) using lactation model following the Eq. 1 [34]: where y ij is milk yield (adjusted to 305 days and twice a day milking); μ is the population mean; hys i is the effect of herd-year-season group i; a ij is the animal breeding for j th animal and i th herd-year-season group, and e ij is the random residual effects. The mean of the accuracy of estimated breeding values for milk yield was 0.61 [34]. The sampled animals were progeny of 42 sires and 150 dams and were born from 2011 to 2013. All animals were fed the same diet of 57.1% concentrate (barley, corn, soybean meal, lime, fish meal, meat meal, salt, fat meal, bentonite, soybean, biosaf, mineral supplement and magnesium oxide) and 42.9% forage (alfalfa, straw and corn silage) for at least two months before the sample collection. Animals were fed four times per day (morning, noon, evening and night) and had ad libitum access to water. The diet was a total mixed ration with 49% of dry matter (DM), 16.5% crude protein (60% rumen degradable protein and 40% rumen undegradable protein), 29% neutral detergent fiber, 21% acid detergent fiber, 4%lignin, 7% ash and net energy for lactation of 1.77 Mcal per kg of DM. The average amount of dry matter intake was 27.3 kg per cow per day.

Traits studied and data collection
The rumen digesta samples were shipped to the University of Tehran's Animal Nutrition Laboratory (Karaj, Iran) and VFA measurements were performed on these samples using the method proposed by Ottenstein and Bartley [35].
The methane emission was predicted using the Eq. 2 [36]: where CH 4 is the PME, and Ac, Pr, and Bu are the concentration of acetic acid (mM), propionic acid (mM), and butyric acid (mM), respectively.
The traits include PME (ml), PME per kg milk (ml), PME per kg fat (ml), acetic acid (%), propionic acid (%), butyric acid (%), valeric acid (%) and isovaleric acid (%). The ratio of PME per cow per day to the milk or fat products per cow per day was used to estimate PME per kg product. After removing the outlier values (observations more than 3 standard deviations away from the mean), 147 samples for valeric acid and isovaleric acid, and 146 samples for the rest of the traits were remained for GWAS analyses. In our study, acetic acid (%), propionic acid (%) and butyric acid (%) had high correlation with PME (0.85, − 0.89 and − 0.14, respectively) and PME per kg of milk (0.60, − 0.62 and − 0.09, respectively) and low correlation with PME per kg of fat (0.17, − 0.17 and − 0.11, respectively).

Genotypic data
A total of 150 cows were genotyped using the GGP-LD v4 SNP panel consists of 30,108 SNPs following the manufacturer's protocol (GeneSeek, Nebraska, United States). The SNPs were removed from the genotyping data for call rate < 95%, minor allele frequency (MAF) < 0.05, and Chi-square < 10 − 6 of Hardy-Weinberg equilibrium test. A total of 23,835 SNPs were retained in the analysis after the filtering. Given the markers density largely affects the power of QTL detection, imputation from BovineSNP30 panel to sequence data was carried out using stepwise imputation from the BovineSNP30 to the BovineHD beadchip (578,505 SNPs) and to sequence data (12,063,146 SNPs) [37]. The R 2 was between 0.85 and 0.93 for imputation of high density to sequence, and between 0.65 and 0.84 for imputation of 54 k to sequence [38]. Furthermore, Van Binsbergen et al. [37] could improve the mean accuracy of imputation from 0.37 to 0.65 using the stepwise imputation (Bovi-neSNP50 (3132 SNPs) to BovineHD (40,492 SNPs) and then to sequence data (1,737,471 SNPs) using chromosome 1 data). In the present study, stepwise imputation was carried out using Minimac3 software [39].
The 1000 Bull Genomes Project database including 129, 43, 15, and 47 key ancestors from global Holstein-Friesian, Fleckvieh, Jersey, and Angus breeds, respectively was used for imputation [10]. The same quality control was performed on sequence data, and a total of 12,063,146 SNPs were retained for the analysis after filtering. Moreover, Eagle (version 2.3) was used to phase genotypes before using Minimac3 for reference and target populations separately [40]. Finally, the imputed genotypes with accuracy lower than 0.30 were removed [41] and a total of 6,583, 595 SNPs were retained for further analysis.

Analysis of fixed effects
Data were analyzed by the least squares analysis of variance using the general linear model procedure of the SAS 9.0 (SAS, Institute Inc., Cary, NC). For analyzing PME, PME per kg milk, PME per kg fat, acetic acid, propionic acid, butyric acid, valeric acid and isovaleric acid, the contemporary group (11 levels) and age were fitted to the model as fixed effect and covariate factor, respectively.

Genome-wide association studies
Association studies between the imputed genotypes and the phenotypes-PME (ml); PME (ml) per kg milk; PME (ml) per kg fat; and VFAs-were performed using EMMAX [42]. EMMAX simultaneously adjusted the tests for both population stratification and relatedness in the association study. The model used for GWAS is shown in Eq. 3: where y is the vector of phenotypic values; X is the design matrix relating phenotypes to their corresponding fixed effects; β is the vector of fixed effects (contemporary group, age, and one SNP at a time); Z is the design matrix relating the phenotypes to their corresponding random polygenic effects; u is the vector of random polygenic effects; and e is the vector of random residual effects. Further, we assumed that u~N (0, Kσ 2 g Þ and e~N (0, Iσ 2 e Þ, where I is an identity matrix, and K is the kinship matrix constructed from sequence data.

Significance levels
The following methods were used to determine the significance threshold of SNPs: 1) significance threshold of p < 5 × 10 − 8 as proposed by Reed et al. [43]; 2) genomewide Bonferroni correction with significance threshold of p < 7.59 × 10 − 9 = p < 0.05 / total number of SNPs [44]; and 3) chromosome-wide Bonferroni correction with significance threshold of p < 0.05 / total number of SNPs on each chromosome [45]. The latter was used because Bonferroni correction would ignore the linkage between SNPs leading to conservative correction and high falsenegative rate [15].

Post-GWAS bioinformatics analysis
Ensembl annotation of UMD3.1 genome version (http://www.ensembl.org/biomart/martview) was used to identify the candidate genes and then human genes orthologous in Ensemble BioMart were identified using the gene identifiers. The DAVID Bioinformatics Resources version 6.7 (http://david.abcc.ncifcrf.gov) was used to carry out gene ontology (GO) analysis. Finally, the gene networks were drawn by GeneMA-NIA webserver (http://genemania.org/).