Genetic relationships between Peranakan Ongole and other cattle breeds
When all breeds were analyzed simultaneously using CMDS (Fig. 1a), the first coordinate (C1, x-axis) explained the genetic differences between B. taurus, B. indicus and B. javanicus. The second coordinate (C2, y-axis) separated breeds by geographical origin, explaining genetic differences between African and non-African cattle. These findings are consistent with the previously reported clustering behavior of 50k genotypes in worldwide cattle breeds [20, 21]. As B. indicus breeds were poorly separated in comparison to the B. taurus breeds as a result of ascertainment bias [22] (see Additional file 1), the relationships among B. indicus breeds were assessed by re-running the analysis without B. taurus and B. javanicus genotypes (Fig. 1b). This analysis revealed that GIR, BRM, PO and NEL cluster as distinct populations, with PO and NEL exhibiting greater similarity. This is not unexpected, provided PO and NEL were believed to derive from the same ancestral population.
Ancestry components in the Peranakan Ongole genome
Results from the model-based clustering analysis are found in Fig. 2. When K = 4 was assumed, European B. taurus, African B. taurus, B. javanicus and B. indicus breeds were assigned to different clusters. SGT and BMA presented an average B. indicus contribution of 34 %, whereas BRM exhibited an average of 6 % of B. taurus ancestry. Interestingly, backcrossing to Ongole bulls over the course of 100 years was not capable of eliminating all B. javanicus introgression in PO cattle, which presented a mean B. javanicus ancestry of approximately 6 %. Historical hybridization with B. javanicus cattle seems to also extend to other Indonesian B. indicus populations, such as the Brebes [21] and Madura [21, 23] breeds. This is in contrast with the historical B. taurus introgression in NEL and GIR [3], which seems to have been consistently eliminated by intensive backcrossing [20, 21, 24]. This suggests that either B. javanicus haplotypes were kept by selective forces, or backcrossing in Indonesia was not as strong as in Brazil, and B. javanicus haplotypes are just drifting in PO cattle.
Bos indicus and B. taurus have karyotypes consisting of 29 pairs of acrocentric autosomes, whereas B. javanicus has 25 pairs of acrocentric and two pairs of submetacentric autosomes. Based on cross-species fluorescence in-situ hybridization analysis, Ropiquet and colleagues [25] suggested that the bi-armed autosomes in B. javanicus are equivalent to Robertsonian translocations of autosomes 1–29 and 2–28 in B. taurus. It is still unclear how the first generations of Indonesian B. indicus × B. javanicus hybrids coped with chromosome number imbalance, but the preservation of sequence homology between these species of cattle seems to have guaranteed successful hybridization.
The f
3 statistics [11] provided further support for the evidence of B. javanicus hybridization in the PO genome. As expected, only SGT, BMA, BRM and PO presented Z-scores lower than −3, suggesting no significant mixture in HOL, NDA, NEL, GIR and BALI. The lowest scores for SGT (Z = −10.12) and BMA (Z = −9.51) were obtained with the comparisons HOL-NEL and HOL-GIR, respectively, reproducing the known crossbreeding in these populations. In the case of BRM, the lowest score (−9.18) was obtained by contrasting with HOL-GIR, also highlighting the B. taurus introgression in this breed. The mixture between B. javanicus and B. indicus in PO was well supported with a score of −17.88 for the sister group BALI-NEL.
We carried out further admixture analyses assuming different numbers of ancestral populations in order to dissect the B. indicus component of the PO cattle genome. In spite of over 100 years of isolation, NEL and PO share a very similar history of importation [3] and are deemed to derive from the same Indian Ongole population. Therefore, it was expected that all B. indicus ancestry in PO pertained to the same origin of NEL. Surprisingly, at K = 7 (Fig. 2), GIR separated from NEL, revealing that PO has contributions from both ancestral pools, with an average of 20.3 % of the PO genome descending from the same ancestral population that originated GIR. Two hypotheses can be formulated from this finding: 1) the first imports of Ongole to Brazil comprised purebred animals, whereas Indonesian imports comprised admixed animals; 2) both Brazilian and Indonesian imports included admixed animals, and systematic breeding in Brazil promoted selection against the GIR component.
Together, these results predicted that the phylogenetic analysis should cluster PO to the B. indicus clade, and a migration edge should be drawn from the B. javanicus branch towards PO. Therefore, we constructed a maximum likelihood tree using TreeMix [10], assuming four migration events representing mixtures in SGT, BMA, BRM and PO. Indeed, the estimated phylogeny behaved as predicted (Fig. 3). As expected, the remaining three estimated migrations were European B. taurus introgressions into SGT, BMA and BRM.
Bos javanicus haplotypes in Peranakan Ongole
We examined a total of 113,665 PO haplotypes with a frequency of at least 10 %. From these, 8010 haplotypes presented \( \mathit{\mathsf{Pr}}\left(\mathit{\mathsf{B}}\mathit{\mathsf{j}}\right) > \mathsf{0.8} \) (Additional file 2). These stringent frequency and probability thresholds were adopted in order to minimize the false positive rate in our sample, considering that low frequency haplotypes could have been generated due to genotyping or phasing errors, and because we were particularly interested in finding high frequency private B. javanicus haplotypes in PO cattle. We considered only NEL and GIR as reference B. indicus because these breeds are representative of the ancestral populations that originated PO, and because BRM presented B. taurus introgression in the clustering analysis. Remarkably, the proportion of putative B. javanicus haplotypes in respect to all detected haplotypes was approximately 7 %, which is consistent with the average ancestry estimated from the admixture analysis.
A total of 779, 1480, 3286 and 2465 haplotypes were categorized in groups A (private B. javanicus haplotypes), B (B. javanicus haplotypes IBS with B. indicus rare haplotypes), C (B. javanicus haplotypes IBS with B. indicus common haplotypes), and D (remaining candidates), respectively (Additional file 1). Although groups A and B were of primary interest, these categories can also emerge from the ascertainment bias of the SNP panel. Group C is less prone to false positives induced by SNP bias, but is also more susceptible to ancestry mis-assignment because frequent IBS haplotypes are found in B. javanicus and B. indicus.
Only two genomic regions presented nearly fixed B. javanicus haplotypes (frequency > 80 %): 5:106.1-106.5Mb and 16:33.0-33.4Mb. The highest five markers haplotype frequency within the chromosome 5 region (97.9 %) was also fixed in B. javanicus, but absent in B. indicus. This segment contained four protein-coding genes: fibroblast growth factor 6 (FGF6), fibroblast growth factor 23 (FGF23), fructose-2,6-bisphosphatase (FR2BP, also known as TP53-induced glycolysis and apoptosis - TIGAR), and cyclin D2 (CCND2). The chromosome 16 region contained nearly fixed haplotypes in PO and B. javanicus (>91 %), but with modest frequencies in B. indicus (<12 %). Five protein-coding genes mapped to this region: EF-hand calcium binding domain 2 (EFCAB2), heterogeneous nuclear ribonucleoprotein U (HNRNPU), COX20 (FAM36A), peptidase domain containing 1 (PPPDE1) and human chromosome 1 open reading frame 101 (C1ORF101). Additionally, one small nucleolar RNA and one U6 spliceosomal RNA also mapped to this chromosome 16 domain.
Besides the possibility of false positives due to ascertainment bias, these haplotypes may have achieved nearly fixation either because an advantageous B. javanicus allele in this chromosome segment has been subjected to positive selection in PO, or because this B. javanicus haplotype hitchhiked with a selected B. indicus variant. The small number of nearly fixed B. javanicus haplotypes suggests that most of these are just drifting in PO, but we also do not discard balancing selection.
Candidate QTLs for birth weight in Peranakan Ongole
The genotype filters reduced the SNP panel to 34,482 markers, which were estimated to explain 33.5 % of the variance in birth weights, with a 95 % credible interval of 12.0 % - 67.6 %. These estimates must be taken with caution, as reliable variance components inference often requires the analysis of thousands of records. A clear limitation of the QTL mapping analysis presented here is the use of a small sample size (n = 48), rendering our investigation prone to high false discovery rates and limited to the detection of major QTLs. Nevertheless, we extensively compared the results obtained here with the literature in order to determine whether some of the detected loci have been previously associated with body size traits in other cattle breeds.
A total of 34,221 windows of 20 consecutive SNPs were inspected, and in spite of the small number of PO samples, positional candidates for birth weight were found on chromosomes (CHR) 1, 3, 11, 14, 15, 18, 24 and 29 (Fig. 4). Remarkably, the majority of these candidate loci have already been described as associated with body size in beef cattle.
The positional candidate locus on CHR14 (peaking at 25 Mb) maps to the PLAG1 (pleiotropic adenoma gene 1) chromosomal domain, and was previously found to be associated with birth weight in NEL cattle [26], as well as with a number of body size [27–30] and reproductive traits [19, 31, 32] in several B. taurus and B. indicus breeds. The mechanism by which PLAG1 may affect fetal growth and reproduction is most likely associated with the trans-acting regulation of the expression of insulin-like growth factors, particularly IGF2 [33].
In Brahmans, the C allele of rs109231213 in the vicinity of PLAG1 is associated with positive effects on body size but negative effects on fertility, and also presents evidence of selection and B. taurus origin [29]. We hypothesized that a similar phenomenon could underly birth weight variance in PO, namely selective pressures on the PLAG1 region potentially involving B. javanicus haplotypes.Interestingly, this CHR14 segment indeed presented large LD blocks (data not shown). High LD in this chromosome domain has also been observed in NEL [19], and a possible event of balancing selection has been hypothesized in that breed [26]. However, this long range LD can be partially explained by our sampling strategy, which privileged extremes from the phenotypic distribution. Furthermore, we found no evidence of differential effects between B. javanicus and B. indicus haplotypes for this or any other candidate QTLs (data not shown), suggesting that variants of B. indicus origin are responsible for differences in birth weight in PO.
The putative QTL on CHR3 (116.4 to 116.8 Mb) is in the vicinity of associations for ribeye area in Shorthorn [28] and calf size in Holstein cattle [34], and contains genes that are involved in early embryo development and growth, such as gastrulation brain homeobox 2 (GBX2) and constitutive photomorphogenic homolog subunit 8 (COPS8). The CHR15:44.6-45.0 Mb region is nearby the ribosomal protein L27a gene (RPL27A) and coincides with QTLs for mature weight and ribeye area in Hereford [28] and mature height, ribeye area and carcass weight in Angus [35]. The 30.9 to 32.2 Mb region on CHR18 is in a gene desert but flanks QTLs for longissimus dorsi area and body weight in Angus [35], and carcass weight in Angus x Brahman crosses [36]. In contrast, the locus on CHR29 (29.2 to 30.7 Mb) contains 24 protein-coding genes, and also overlaps QTLs for carcass, weaning and yearling weights in Maine-Anjou cattle [28]. Interestingly, the adoption of the highly stringent Bonferroni correction prevented the detection of this QTL in a genome-wide association study for birth weight in NEL [26].
The candidate QTLs on CHR1 (28.9 to 29.2 Mb), CHR11 (97.8 to 98.5 Mb), and CHR24 (61.4 to 62.0 Mb) did not overlap any previously identified QTLs for body size or related traits, and are either novel QTLs or false discoveries. However, these loci harbor interesting candidate genes. For instance, the CHR1 region shelters the 1,4-alpha-glucan branching enzyme gene (GBE1), responsible for increasing the solubility of glycogen molecules to allow its accumulation in the liver and muscle cells, and could be involved in somatic growth and muscle development. The CHR11 region contains the somatic cytochrome c gene (CYCS), previously shown to affect embryonic growth [37]. Finally, the CHR24 locus shelters the PH domain and leucine rich repeat protein phosphatase 1 gene (PHLPP1). Disruption of this gene in mice has been recently found to lead to decreased bone mass [38].