Multiple association analysis of loci and candidate genes that regulate body size at three growth stages in Simmental beef cattle

Background Body size traits as one of the main breeding selection criteria was widely used to monitor cattle growth and to evaluate the selection response. In this study, body size was defined as body height (BH), body length (BL), hip height (HH), heart size (HS), abdominal size (AS), and cannon bone size (CS). We performed genome-wide association studies (GWAS) of these traits over the course of three growth stages (6, 12 and 18 months after birth) using three statistical models, single-trait GWAS, multi-trait GWAS and LONG-GWAS. The Illumina Bovine HD 770 K BeadChip was used to identify genomic single nucleotide polymorphisms (SNPs) in 1217 individuals. Results In total, 19, 29, and 10 significant SNPs were identified by the three models, respectively. Among these, 21 genes were promising candidate genes, including SOX2, SNRPD1, RASGEF1B, EFNA5, PTBP1, SNX9, SV2C, PKDCC, SYNDIG1, AKR1E2, and PRIM2 identified by single-trait analysis; SLC37A1, LAP3, PCDH7, MANEA, and LHCGR identified by multi-trait analysis; and P2RY1, MPZL1, LINGO2, CMIP, and WSCD1 identified by LONG-GWAS. Conclusions Multiple association analysis was performed for six growth traits at each growth stage. These findings offer valuable insights for the further investigation of potential genetic mechanism of growth traits in Simmental beef cattle.


Background
In China, the production of beef cattle is a very important agribusiness, and the Simmental breed accounts for more than 70% of beef-producing herds. Beef producers use body size to monitor the growth of each animal throughout the fattening period [1,2], as this trait is an indicator of cattle [3] and longevity [4]. Monitoring the development of each animal can help to increase profits by enhancing the efficiency of feed and management [5][6][7]. Besides in human, additive genetic effect explains 81% of the variation in height [8], and the heritability for both hip height (HH) and height size (HS) is 0.33-0.4 in cattle [9]. Bouwman et al. reported that the lead variants in significant regions explained at most 13.8% of the phenotypic variance in their meta-analysis of 58,265 cattle from 17 populations [10]. In addition, daily body linear measurements, specifically body height (BH) and HH, two highly reliable and accurate indicators for body weight, are easier to obtained than body weight [11]. Furthermore, good depth of HS in cattle is a sign of good feet and leg conformation [12], and dairy cows with higher HH will subsequently have better milk performance [13]. However, there is little information on the molecular mechanisms of body size traits in Chinese Simmental beef cattle.
Genome-wide association studies (GWAS) are robust statistical tools are that broadly identify candidate genes with significant SNPs involved in production traits [14][15][16], growth traits [17,18], carcass quality traits and fertility traits [19,20]. In beef cattle, various SNPs, genes, and haplotype blocks have been found to associate with growth, however the current GWAS-based studies focus mainly on only one growth parameter [21], such as the weaning size [22], yearling weight or stature upon slaughter [23]. Furthermore, loci controlling growth traits may be variable in different growth stages, and some loci may control traits throughout the lifetime of the animal [24]. Therefore, it is more reasonable to perform GWAS on growth traits on each stage separately. Multi-trait methods have been developed to increase statistical power and to identify pleiotropic loci in GWAS [25]. The longitudinal GWAS consider all time points when assessing whether significant SNPs associate with trait development over time [26], and this method is powerful for identifying these time-dependent and consistent loci [27]. We performed multiple trait GWAS (multi-trait GWAS) and longitudinal-GWAS (LONG-GWAS) based on single-trait analysis. Multi-trait GWAS and LONG-GWAS were not replacements for the single-trait GWAS; instead, they complemented singletrait GWAS. Thus, understanding the genetic mechanisms involved in inter-individual variations in body size may provide new insights that can help to manipulate cattle growth and production.
In this study, six body size traits were routine measured from the time cattle entering farm to slaughter, which provides valuable resources to study the complete growing period. The aim of our study was to comprehensively analyze of candidate genes and QTL regions associated with growth traits by conducting three GWAS approaches in Simmental beef cattle. Our findings offer valuable insights for the further investigation of the potential mechanism of growth traits in Simmental beef cattle.

Results
Population stratification assessment Figure 1 shows that the population stratification of the Simmental population based on the PCA was divided into five separate clusters, demonstrating an obvious stratification in the reference population. The population stratification caused by different genetic influences and breeding conditions, as potential confounders, was corrected by significance testing. We summarized the genome-wide significant and suggestive SNP regions for these traits in Fig. 2. The Manhattan plots and quantile-quantile (Q-Q) plots are shown in Figure S1 and Figure S2, while Q-Q plots suggested that there was no inflation or systematic bias in this research. Most points were near diagonal line because the GWAS model sufficiently considered the population structure and only a few SNPs were associated with the target traits. Meanwhile, the genomic inflation factors (λ) at each trait ranged from 1.03 to 1.10, indicating consistent consequence with PCA.

Summary of significant loci identified by three approaches
Briefly, we found 45, 66 and 19 SNPs significantly were associated with six body size traits by single-trait GWAS, multi-trait GWAS and LONG-GWAS, respectively. There were no significant loci for single-trait BH6 (single-trait GWAS for BH at 6 months after birth, and so forth), single-trait AS6, single-trait CS6, and LONG-AS. In addition, ten SNPs were associated with at least one of the six traits and eight SNPs were strongly associated with these traits in at least one of the three models. While according to their biological functions, 21 suggestive genes were selected as candidate genes and some details of them, including their positions in the genome, the nearest reported genes, the minor allele frequencies (MAF) and the p values are listed in Table 1.

SNPs identified by single-trait GWAS
A total of 45 SNPs achieved genome-wide significance associated with at least one of the six traits, with the p-value ranging from 9.99 × 10 − 6 (BovineHD0700018941 for BL18) to  2.11 × 10 − 8 (BovineHD0200014365 for HS18), and the MAF ranging from 0.003 (BovineHD0700034055) to 0.497 (Bovi-neHD2600012755). Among these, two SNPs near SOX2 (SRY-Box 2) on BTA1 were also identified in the liver and stomach [28]. On BTA2, two loci in the 0.69 Kb region were significantly associated with single-trait HS18 and one of them (BovineHD0200014365) was also associated with multi-trait18. On BTA6, three SNPs in the 0.46 Mb region were located near RASGEF1B (RasGEF Domain Family Member 1B). On BTA7, one SNP (BovineHD0700034055) was associated with single-trait HS12 and single-trait AS12, namely EFNA5 and another SNP (BovineHD0700012966) was associated with single-trait HH6 and multi-trait6, namely PTBP1. On BTA10, one SNP (BovineHD1000002378) was associated with single-trait BH6 and multi-trait6, namely SV2C. While on BTA11, four loci in 0.04 Mb region were associated with the single-trait HS12 and one of them (Bovi-neHD1100007368) was also associated with multi-trait12, all of which were near PKDCC. On BTA13, SNP Bovi-neHD1300012489 and BovineHD1300012894 were associated with single-trait BH18 and single-trait AS18, respectively. These two SNPs also strongly associated with multi-trait18. On BTA23, two SNPs were found within PRIM2.

SNPs identified by multi-trait GWAS
Multi-trait GWAS identified 66 SNPs within or near 36 genes that were distributed on 21 chromosomes, including 8 loci that were also identified by single-trait GWAS, which indicated that these loci suggestively regulate the development of the body growth (Table 1). Among these, two promising loci within the 11.4 Kb region were detected, namely SLC37A1. On BTA6, two suggestive loci were detected, one near LAP3 that associated with multi-trait12 and another near PCDH7. Two genome-wide loci were identified within the 0.02 Mb region of MANEA on BTA9. Furthermore, four promising loci were detected within in 0.04 Mb region of LHCGR on BTA11, and 13 loci were identified within the 0.03 Mb region of AKR1E2 on BTA13, which were also detected by single-trait analysis.

SNPs identified by LONG-GWAS
Nineteen loci were identified by LONG-GWAS, including three significant loci on 12 chromosomes (Table 1).

Discussion
We performed single-trait GWAS, multi-trait GWAS, and LONG-GWAS for six body size traits on three growth stages in Simmental beef cattle. However, the three methods yielded different results with few shared loci. The reason for this discrepancy was likely due to the restricted dataset in single-trait GWAS and LONG-GWAS analyses. One universal phenomenon that cannot be ignored is that growth traits are controlled by multiple genes [29], and each method had its specific advantages in the identification of distinct loci. For example, single-trait GWAS is robust in detecting trait-specific QTLs and multi-trait GWAS is efficient for mapping pleiotropic QTLs [30], whereas LONG-GWAS can improve the detection power for time-dependent and consistent loci [31]. Thus, combining these three GWAS methods was expected to markedly improve the analysis the genetic mechanism of the body traits of beef cattle. In addition, since many complex traits have a similar architecture across diverse species [7], which prompted us to compare some of our significant genes with the previous reports that investigate the same genes and their involvements in growth. As a result, 21 suggestive genes were considered as candidate genes that were involved in the growth of cattle, swine, mice, and humans.

Candidate genes identified by single-trait GWAS
On BTA1, two SNPs were near SOX2 (SRY-Box 2), which encodes a transcription factor involved in the regulation of embryonic development [32,33]. The paralog of this gene is SOX17, which positively affects the growth traits of cattle, and the conserved regions of this gene in human genome is closely related to body development [34]. On BTA2, two SNPs were near SNRPD1 (small nuclear ribonucleoprotein D1 polypeptide), which is a member of the ghrelin receptor family, and the encoded protein is involved in zinc-dependent signaling in epithelial tissue [35]. On BTA6, variations near RAS-GEF1B (RasGEF domain family member 1B) were associated with body height [36]. In addition, body height was positively correlated with calcium absorption, which is an important determinant of calcium balance [37]. On BTA7, a SNP near EFNA5 (ephrin A5) was associated with two traits (HS and AS) at the same stage. It was also identified as a candidate gene for growth traits in broiler chicken [38]. Another SNP near PTBP1 (polypyrimidine tract binding protein 1) was found to show genome-wide association with growth traits at 6 months by both single-trait and multi-trait GWAS. Its expression level determined the release of insulin, thereby affecting development [39]. On BTA9, a SNP (BovineHD0900027283) located in SNX9 (sorting nexin 9), as olfactory receptor, was associated with growth traits in Yorkshire pigs [40]. The SNP near SV2C (synaptic vesicle glycoprotein 2C) was associated with BH by both single-trait and multi-trait GWAS. This gene was reported to modulate dopamine release in neural and endocrine cells [41]. On BTA11, PKDCC (protein kinase domain containing, cytoplasmic) was associated with HS in both single-trait and multi-trait analyses. This gene was involved in the maintenance of bone density in humans [42]. On BTA13, SYNDIG1 (synapse differentiation inducing 1) has been reported as a factor influencing the final weight and backfat thickness of Landrace pigs [43], whereas the AKR1E2 (aldo-keto reductase family 1 member E2) variant was associated with body length and girth in cattle [44]. On BTA23, the PRIM2 (DNA primase subunit 2) was associated with body weight and trait changes in pigs [45,46].

Single-trait GWAS versus multi-trait GWAS
Multiple-trait analysis of linkage experiments has been reported to significantly enhance the power to detect common SNPs across traits [47,48]. Therefore, we used multi-trait analysis to complement single-trait GWAS rather than to replace it. In the single-trait GWAS, the minimum p values for the three stages were 3.90E-07, 9.92E-07 and 2.11E-08 respectively. These three values decreased to 8.23E-13, 1.73E-09 and 3.84E-10 in the multi-trait GWAS, respectively. We also identified several critical loci as follows. On BTA1, the SLC37A1 (solute carrier family 37, member A1) gene, which encodes a glucose-6-phosphate transporter that is involved in the homeostasis of blood glucose [48], was found to be the best candidate gene for modifying milk production traits [49]. On BTA6, LAP3 (leucine aminopeptidase 3) was reported to play critical roles in the regulation of hormone levels and protein maturation. Another study demonstrated that putative regulatory elements in the PCDH7 (protocadherin 7) gene may have roles in residual feed intake in Nelore cattle [50]. In addition, MANEA (mannosidase endo-alpha), which has roles in proteolysis, was associated with the birth weight of Canchim beef cattle [17]. On BTA11, a mutation in the LHCGR (luteinizing hormone/choriogonadotropin receptor) gene was as the cause of empty follicle syndrome [51].

Single-trait GWAS versus LONG-GWAS
We used LONG-GWAS, which involved multiple phenotype measurements for each individual [24]. One disadvantage of this method was that incorporating all data may have overwhelmed significant signal, that is, if QTL effects varied during the different stages [52]. In this study, these time-specific expressed QTLs identified by the single-trait and multi-trait GWAS were not detected by LONG-GWAS. However LONG-GWAS also detected some significant functional loci as follows. On BTA1, P2RY1 (purinergic receptor P2Y1), a candidate gene that affects the serum Ca 2+ , encoded for a member of the family of G protein-coupled receptor family that works as receptor for extracellular ATP and ADP [53]. On BTA3, the MPZL1 (myelin protein zero like 1) gene could significantly enhance the migratory and metastatic potential of hepatocellular carcinoma cells by phosphorylating and activating the pro-metastatic protein [54]. Besides on BTA8, LINGO2 (leucine rich repeat and Ig domain containing 2), which is expressed in the central nervous system of mouse embryos, has been reported to associate with the body mass in a cohort of elderly Swedes [55]. On BTA18, CMIP (C-Maf inducing protein), a candidate gene for reading-related traits, was also associated with plasma lipoprotein levels [56]. Moreover, WSCD1 (WSC domain containing 1), which encodes a protein with sulfotransferase activity that participates in the metabolism of glucose, was a candidate gene for feed efficiency and feeding behaviors in the White Duroc × Erhualian F2 population [57].

Conclusions
In conclusion, a total of 58 SNPs corresponding to 21 genes were found to be associated with six body size traits at 6, 12 and 18 months. Future studies characterizing the functions of these candidate genes may uncover the genetic architecture underlying the body size traits in Simmental beef cattle.

Resource population and phenotypes collection
Simmental beef cattle (born between 2008 and 2015) were established in Ulgai, Xilingole League, Inner Mongolia of China. Six body size traits at three growth stages (6, 12, and 18 months after birth) were measured simultaneously for each individual. The details of trait measurement are as follows: BH, also named wither height, the height from wither to ground; BL, the length from the front edge of the scapula to the back edge of the ischial tuberosity; HH, height from hip to ground; HS, also named heart girth, the chest girth in the back edge of scapula; AS, the girth of the thickest part of the abdomen; CS, the girth of the thinnest cannon bone. In addition, only cattle (133 individuals, Table S1) whose phenotype records overlapped during three growth stages were used for LONG-GWAS analysis. The blood sample specimens were collected during the regular quarantine inspection of the farms was conducted. After this study, all individuals were slaughtered in strict compliance with the Institutional Meat Purchase Specifications for fresh beef. All procedures were conducted in strict compliance with the guidelines established by the Ministry of Agriculture of China. Some descriptive statistics and heritability estimates of six traits at three growth stages are presented in Table 2.

Genotyping and quality control
Genomic DNA was isolated from blood samples using the TIANamp Blood DNA Kit (Tiangen Biotech Co.Ltd., Beijing, China). DNA quality was acceptable when the A260/A280 radio was between 1.8 and 2.0. Genotyping was performed with the Illumina BovineHD Beadchip (Illumina Inc., San Diego, CA, USA) and the PLINK v1.07 Software was used for quality control [58]. In this study, animals with a call rate (< 0.9) were discarded. SNPs were deleted the following standards, including minor allele frequency (< 0.01), SNP call rate (< 0.05) and Hardy-Weinberg equilibrium values (p < 1 × 10 − 6 ). Finally, 671,192 SNPs on 29 autosomal chromosomes with an average distance of 3 kb were generated for the analysis.

Single-trait GWAS
For six body size traits in three growth stage, we performed single-trait GWAS, respectively. The compressed mixed linear model (CMLM) was called because it reduced computing time by clustering individuals into groups, increased the power in QTN detection by eliminating the need to re-compute variance components, and enhanced the effectiveness in correcting the inflation from the polygenic background and controlling the bias of population stratification [59,60]. Briefly, a principal components analysis (PCA) was performed and a kinship matrix was calculated using the Genome Association and Prediction Integrated Tool (GAPIT) package in R v3.4.2 [61]. To revise the effects of population structure, the Q matrix was reflected by the PCA. To replace the incomplete pedigrees, the K matrix was calculated by the VanRaden algorithm [62]. This model is as follows: where y is a vector of the observed phenotypes; W was a vector of SNP genotype indicators, which was coded as 0, 1 and 2 corresponding to the three genotypes AA, AB, and BB with B being the minor allele. υ was the effect of marker, which is treated as a fixed effect; Variable X is an incidence matrix for non-genetic fixed effects, and β is a non-genetic vector of fixed effects including month ages (time of birth to measurement), enter weight (weight of just entering the farm), fattening days (time of entering farm to measurement) and principal component effects (the top three eigenvectors of the Q matrix). Variable Z is an incidence matrix for a vector of polygenic effects, and parameter u is a vector for residual polygenic effects with an assumed N (0, Kσ 2 ) distribution, where σ 2 is the polygenic variance and K is a marker inferred kinship matrix. While е is a vector for random residual errors with a putative N (0, I σ 2 e ) distribution, where σ 2 e is the residual variance. The heritability (h 2 ) is defined as: h 2 = σ 2 σ 2 þσ 2 e . The CMLM analysis was performed with GAPIT Software package (http://www. maizegenetics.net/gapit). Quantile-quantile (Q-Q) plots were generated to visualize the goodness of fitting for the GWAS model accounted by the population structure and familial relatedness. The negative logarithm of the p value from the model was calculated against the expected value based on the null hypothesis. The threshold p value after Bonferroni correction was 0.05/N = 7.45 × 10 − 8 , where N is the number of SNPs. In light of the fact that the Bonferroni correction results were too stringent with low statistical power [63]. Hence, we adopted the false discovery rate (FDR) to determine the threshold values for Single-GWAS, Multi-trait GWAS and LONG-GWAS. The FDR was set as 0.01, and the threshold p value was calculated as follows: where n is the number of P < 0.01 in the results, and m is the total number of SNPs [64].

Multi-trait GWAS
For the three growth stages, the multi-trait GWAS were conducted to detect pleiotropic SNPs and the model was a Chi square statistic, which approximately followed a Chi square distribution with the number of traits tested as the number of degrees of freedom. It was calculated for each SNP using the following formula [65]: where b v i is the estimate of v and the corresponding variance V ðb v i Þ can be obtained by the compressed mixed linear model (CMLM); While t i is the 6 × 1 vector of the signed t-values of the ith SNP from the abovementioned single-trait GWAS for the six traits. Matrix t 0 i is the transpose of the vector t i , and V −1 is the inverse of the 6 × 6 correlation matrix between traits, which was calculated by the estimated effects of the qualified SNPs (signed t values).

Long-GWAS
For these individuals that completely obtained body size information at three growth stages, we conducted a longitudinal GWAS by LONG-GWAS [24]. This model was similar to CMLM, except that the phenotypic variance was partitioned to SNPs, fixed factors (the abovementioned β vector), polygenic effects, time stage effects and residual variance. Moreover, numerous studies have reported that the longitudinal design could facilitate the identification of time-dependent and consistent loci, which increased the statistical power due to their effectiveness in incorporating the correlation structure of multiple measurements and alleviating the multiple testing burden [26,27,66]. The code data implementing this method may be found at http://genetics.cs.ucla.edu/long-GWAS/. This model is as follows: In this formula, y * is the adjusted phenotype (these fixed effects mentioned in CMLM were adjusted in order to account for additional confounding). The incidence matrix W, Z, the vector v, u and e are consistent with CMLM mentioned above. Differently, the parameter γ is a vector for time stage effects with a putative N (0, σ 2 v D) distribution, where D is a known block diagonal matrix representing the covariance between permanent environmental components. The D matrix was calculated by this formula: D = E ⊗ I, where E is a 3 × 3 matrix representing the covariance between the set of 3 time points for each individual.

Identification and annotation of candidate genes
The UMD3.1 genome assembly was used to located genes for annotation, and the QTLdb database (http:// www.animalgenome.org) was applied to search for related QTL regions.
Additional file 1: Figure S1. The strengths of genome-wide association studies (GWAS) are illustrated by the Manhattan plots on the left panel. The deviations of the signals from null hypothesis are illustrated as the Quantile-Quantile (QQ) plots on the right panel. The negative logarithms of the observed (y axis) and the expected (x axis) P values are plotted for each SNP (dot). GWAS were performed six body size traits months 6, 12 and 18 after birth separately. Each analysis is labeled as trait (BH or HH) and month on the far right. The number neighboring each trait indicates the age of measurement (e.g., BH6 = Body Height at 6 months). The 29 chromosomes are color coded.
Additional file 2: Figure S2. The strengths of genome-wide association studies (GWAS) are illustrated by the Manhattan plots on the left panel. The deviations of the signals from null hypothesis are illustrated as the Quantile-Quantile (QQ) plots on the right panel. The negative logarithms of the observed (y axis) and the expected (x axis) P values are plotted for each SNP (dot). GWAS were performed six body size traits months 6, 12 and 18 after birth separately. Each analysis is labeled as trait (BH or HH) and month on the far right. The number neighboring each trait indicates the age of measurement (e.g., BH6 = Body Height at 6 months). The 29 chromosomes are color coded.
Additional file 3: Table S1. Descriptive statistics of 133 phenotypic records in LONG-GWAS method.