Multiple association analysis strategies identified loci and candidate genes for body size on three growth stages in Simmental beef cattle

1 Background: Body size traits as one of the main breeding selection criteria have long since being 2 widely used to monitor cattle growth and evaluate the selection response. Here the volume of body 3 size is indicated by body height (BH), body length (BL), hip height (HH), heart size (HS), abdominal 4 size (AS) and cannon bone size (CS). We performed genome-wide association studies (GWAS) for 5 these traits to a broad spectrum of three growth stages (months 6, 12 and 18 after birth) under three 6 statistical models: single-trait GWAS, multi-trait GWAS and LONG-GWAS. The whole genomic 7 single nucleotide polymorphisms (SNPs) were obtained from the Illumina Bovine HD 770K 8 BeadChip genotype on 1217 individuals. 9 Results: In total, 19, 29, and 10 significant SNPs were identified by the three models, respectively. 10 While 21 genes among in these loci appeared to be promising candidate genes, including SOX2, 11 SNRPD1, RASGEF1B, EFNA5, PTBP1, SNX9, SV2C, PKDCC, SYNDIG1, AKR1E2 and PRIM2 12 detected by single-trait analyze; SLC37A1, LAP3, PCDH7, MANEA and LHCGR detected by multi- 13 trait analyze; P2RY1, MPZL1, LINGO2, CMIP and WSCD1 detected by LONG-GWAS. 14 Conclusions: Multiple association analysis strategies were performed for six growth traits on each 15 stage. This research could offer valuable insights to further explore the potential mechanism of 16 growth traits in Simmental beef cattle.


20
Beef cattle production plays an increasingly important role in Chinese agribusiness and Simmental 21 breed accounts for more than 70% of beef-producing herds. The fact that body size was frequently 22 selected by beef producers to monitor growth and track the developmental progress of each animal 23 throughout the fattening period in farming [1,2]. While these economic traits are of interest to health 24 [3], longevity [4] and increasing profit in efficiency of feed and management [5][6][7], which have 25 received growing attention for more than a century. Besides in human, additive genetic effect 26 explains 81% of the variation in height [8], and the heritability for both HH and HS in cattle is 0.33-27 0.4 [9]. Bouwman et al. reported that the lead variants in significant regions explained at most 13.8% 28 of the phenotypic variance in their meta-analysis for cattle size using 58,265 cattle from 17 29 populations [10]. In addition, body linear measurements, specifically body height (BH) and hip 30 height (HH), highly reliable and accurate indicators for body weight, has been explored to be much 31 easier to measure than body weight in daily production [11]. The depth of height size (HS) is an 32 indication of good feed conversion and carcass information [12]. Dairy cows with higher HH will 33 subsequently have better milk performance [13]. However, little knows about the molecular 34 mechanisms of body size traits in Chinese Simmental beef cattle. 35 Genome-wide association studies (GWAS) as a robust statistic tool are being broadly asked to 36 identify candidate genes with significant SNPs in production traits [14][15][16] 45 performing association has been made to assess whether some significant SNPs are associated with 46 the process that a trait develops over time [25], and this method showed powerful for identifying 47

SNPs identified by LONG-GWAS 105
Nineteen loci were identified in the LONG-GWAS, including three significant loci on 12 106 chromosomes (Table 2). Among them, two suggestive loci near P2RY1 in the 8.3Kb region were 107 detected, while the later (BovineHD0100032742) also was associated with LONG-CS, LONG-BH and 108 LONG-HH. Another two loci near MPZL1 in the 4.2Kb region were identified and the latter SNP 109 (BTA-68271-no-rs) also showed association with LONG-HH and LONG-BH. Besides, one suggestive 110 SNP near LINGO2 on BTA8 was associated with LONG-HS, which encoded a transmembrane protein 111 belonging to the LINGO/LERN protein family which contains four homologs in the human genome 112 (LINGO1-4) [40]. Additionally, four promising loci in the 0.03 Kb region were detected to associate 113 with LONG-HH, namely CMIP, a key gene in T-cell signaling pathway. On BTA19, a suggestive 114 locus near WSCD1 was associated with LONG-CS. No loci were associated with AS in our research. 115

116
We performed single-trait GWAS, multi-trait GWAS, and LONG-GWAS for six body size traits on 117 three growth stages in Simmental beef cattle. However, the three methods yielded different results 118 with few shared loci, the reasons for this condition are discussed below: first, which is likely caused 119 by the restricted dataset in single-trait GWAS and LONG-GWAS analysis. One universal phenomenon 120 that cannot be ignored is that growth traits are controlled by polygenes with small effects. Then each 121 method has its advantage to identify distinct loci. For example, single-trait GWAS is robust to detect 122 trait-specific QTL and multi-trait GWAS is efficient for mapping pleiotropic QTL [41], while LONG-123 GWAS can improve the detection power for time-dependent and consistent loci [42]. Thus, combining 124 these three GWAS methods was expected to analyze the genetic mechanism of the body traits of beef 125 cattle more comprehensively and convincingly. In addition, since many complex traits have a similar 126 architecture across diverse species [7], we tried to compare some of our significant genes with the 127 previous reports about the same genes and their association with growth. As a result, 21 suggestive 128 genes were considered as candidate genes, which involved in development or associated with growth 129 in cattle, swine, human and mice studies.

Single-trait GWAS versus multi-trait GWAS 157
Multiple-trait analysis of linkage experiments has been reported to significantly enhance the power to 158 detect common SNPs across traits [58,59]. Therefore, the multi-trait analysis was a complementary 159 part to single-trait GWAS rather than replacement for it. In the single-trait GWAS, the minimum p 160 values for three stages were 3.90E-07, 9.92E-07 and 2.11E-08 respectively; whereas these three values 161 decreased to 8.23E-13, 1.73E-09 and 3.84E-10 in the multi-trait GWAS, respectively. While multi-trait GWAS also identified some critical loci as follows. On BTA1, the SLC37A1 (solute carrier family 163 37, member A1) gene, which encodes a glucose-6-phosphate transporter that is involved in the 164 homeostasis of blood glucose [60], was previously proposed as the best candidate mutations for milk 165 production traits  Table 1. In this study, animals with a call rate (<0.9) were discarded. SNPs were selected the following 209 standards, including minor allele frequency ( <0.01), genotype appearances (<0.05) and Hardy-210 Weinberg equilibrium values (p<1×10 −6 ). Finally, 1217 Simmental cattle were generated for the 211 analysis, which consisted of 671192 SNPs with an average distance of 3 kb. 212

Multi-trait GWAS 243
The multi-trait GWAS were conducted to detect pleiotropic SNPs and the model was a Chi square 244 statistic, which approximately follows a Chi square distribution with the number of traits tested as the 245 number of degrees of freedom, was calculated for each SNP using the following formula [34]: 246 Where ̂ is the estimate of v and the corresponding variance (̂) can be obtained via compressed 249 mixed linear model (CMLM). While is a 6×1 vector of the signed t-values of the ith SNP from the 250 above-mentioned single-trait GWAS for the six traits. Matrix ′ is the transpose of the vector , and 251 V −1 is the inverse of the 6×6 correlation matrix between traits, which was calculated by the estimated 252 effects of the qualified SNPs (signed t values). 253

LONG-GWAS 254
As all experimental individuals were recorded for body size traits at three time stages, we conducted Availability of data and materials 289 We confirm that all raw data underlying our findings are publicly available without restriction.