Detecting the effects of selection at the population level in six bovine immune genes

Background The capacity of a species or population to respond to and survive novel infectious disease challenge is one of the most significant selective forces shaping genetic diversity and the period following animal domestication was likely one of the most important in terms of newly emerging diseases. Inter-specific genome-wide comparison has suggested that genes, including cluster of differentiation 2 (CD2), ADP-ribosyltransferase 4 (ART4), tyrosine kinase binding protein (TYROBP) and interleukins IL2, IL5, IL13, may have undergone positive selection during the evolution of the bovine lineage. Past adaptive change implies that more recent variation may have also been subject to selective forces. Results In this paper, we re-sequence each of these genes in cattle cohorts from Europe, Africa and Asia to investigate patterns of polymorphism at the population level. Patterns of diversity are higher within Bos indicus suggesting different demographic history to that of Bos taurus. Significant coding polymorphism was observed within each of the cell-surface receptors. In particular, CD2 shows two divergent haplotypes defined by a series of six derived nonsynonymous substitutions that are significantly clustered on the extracellular surface of the protein and give significant values for Fay and Wu's H, strongly suggesting a recent adaptive history. In contrast, the signaling molecules (especially IL13) display outlying allele frequency spectra which are consistent with the effects of selection, but display negligible coding polymorphism. Conclusion We present evidence suggestive of recent adaptive history in bovine immune genes; implying some correspondence between intra- and inter-specific signals of selection. Interestingly, three signaling molecules have negligible nonsynonymous variation but show outlying test statistics in contrast to three receptors, where it is protein sequence diversity that suggests selective history.


Background
Domestication has involved severe and novel selective pressures on cattle and other domesticates, undoubtedly leaving behind population genetic signatures of adaptation [1]. In particular, the epidemiology of infectious diseases would have changed during domestication due to sharp increases in population densities and the new proximity of previously separated species. The effect of the domestication process in terms of emerging diseases is often discussed with respect to human populations [2], however, similar processes are as likely to have shaped the genomes of domesticated animals. By screening the bovine genome for selective signatures associated with immunity or disease susceptibility, we may be able to identify those genes that have been of critical importance to the development of disease resistance [3].
We have previously reported a comparative genomics study identifying several bovine genes, including cluster of differentiation 2 (CD2) and ADP-ribosyltransferase 4 (ART4), as having significant evidence of adaptive evolution on the evolutionary lineage leading to modern cattle from the bovine-pig common ancestor [4]. Briefly, we investigated evidence of variable selective pressure on the bovine lineage in a dataset of approximately 3,000 orthologs from human, mouse, cow and pig. A gene was inferred to be subject to adaptive evolution on the bovine lineage when a model of variable selective pressure specifically on that lineage was significantly favoured over the alternative model and the ratio of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site, ω, was significantly greater than 1. CD2 (ω = 3.858) and ART4 (ω = 1.356) were ranked first and third in a list of significant scores ranked by this estimated ratio.
Emerging data from the bovine genome project has allowed us to similarly investigate a larger number of bovine genes. Interleukins 2, 5, 13 (IL2, IL5 and IL13), and tyrosine kinase binding protein (TYROBP) each gave preliminary evidence suggestive of adaptive evolution on the bovine lineage (Lynn et al, unpublished).
Several studies of human genes, which have signatures of adaptive evolution between species, found that many of these genes have also been subject to more recent population selection [5][6][7]. To assess the level of within-species polymorphism and to identify potential population-specific selective signatures in these six bovine immune genes we have re-sequenced exonic, intron and intergenic regions of each in population samples from Africa, Asia and Europe, plus an outgroup species, bison. Thirty-nine individual cattle, representing 16 different breeds were included in these samples. The three continental groups, which probably reflect the products of separate domestication events, certainly have had different post-domestic histories and have endured markedly different infectious challenges [8,9].

Patterns of sequence diversity
The three samples chosen for re-sequencing were multibreed collections of European and West African Bos taurus plus B. indicus sampled across four breeds of South Asian origin. Each of these three continental populations is thought to have separate histories since domestication and here the legacy of migration was minimized by care-ful selection from populations known from prior data to have experienced minimal introgression [10]. Summary statistics and tests of neutrality are given for each gene within each population sample in Table 1. Each locus showed polymorphism, within and between the three continental samples (alignments are given in Additional file 1). Interestingly neither IL2 nor IL5 showed any nonsynonymous sequence changes and IL13 displayed only a single nonsynonymous change present in one African and four B. indicus chromosomes. There is a marked tendency for the B. indicus sample to give the highest sequence diversity (five from six loci) and also the highest haplotype diversity (five from six loci). Interleukin 13 Interleukin 13 (IL13) is a cytokine which plays a critical role in allergic inflammation and in susceptibility to parasites, such as, helminths, schistosomes and nematodes [11][12][13][14]. The pattern of diversity at the IL13 locus is consistent with non-neutral evolution in the African population sample, with significant results in four tests of neutrality (Table 1). Significantly negative values of Tajima's D, Fu and Li's, D and F statistics, and Fu's Fs were obtained for this population, suggesting positive selection; nucleotide diversity was also somewhat reduced relative to the other samples [15][16][17]. These significance values are calculated on coalescent simulations assuming a constant effective population size. In order to consider the demography of domestication, which probably included a significant post-domestic population expansion, we also calculated Tajima's D and Fay and Wu's H conditioned on a significant population expansion. The IL13 African D value remained below the 5 th percentile. Aside from the relatively robust signature of positive selection in the African population, Fu and Li's D and Fu's Fs also showed somewhat less significant departures from neutrality in European samples (Table 1).

Neutrality tests at each locus
IL13 displays almost no coding sequence polymorphism with only a single (nonsynonymous) SNP in some B. indicus samples and one African heterozygote (possibly due to B. indicus introgression). The nonsynonymous SNP (A7T) is not predicted to have an affect on protein structure by the SIFT or PolyPhen algorithms [18,19]. Apart from one synonymous substitution compared to plains bison, there are no differences between the bovine IL13 coding sequence and this outgroup species, although there are several intergenic and intronic SNPs.

Interleukin 5
This gene is part of a cytokine cluster on Bta7 along with IL13 and IL4, all with known involvement in the Th2-type response. Co-expression at this cluster is controlled by elements over a 120 kb range in humans [20]. The gene prod-uct is a main regulator of eosinophil maturation and activation and has been implicated as influencing Shistosoma infection in humans [21]. The gene showed moderate nucleotide and haplotypic diversity but no nonsynonymous polymorphisms within the sampled populations. Fay and Wu's H tests were significant in both the African (P = 0.001) and European (P = 0.002) samples and remained outside the 95% interval in comparison to values generated under a simulated domestication bottleneck and expansion.
Interleukin 2 IL2 is a central regulator of the immune system involved in processes including production of cytotoxic T-cells, proliferation of immunoglobulins by B-lymphocytes and induction and maintenance of natural killer (NK) cells [22]. Three neutrality tests, based on the allele frequency spectrum (Tajima's D, Fay and Wu's H and Fu's Fs), show significant deviation from neutrality at the IL2 locus in European populations (Table 1) [15][16][17]23]. On comparison to the coalescent model, modeled on a population expansion akin to domestication, Fay and Wu's H retained an outlying value (P < 0.025) but Tajima's D did not retain significance.
The African population contains only a single SNP over the 3461 bp re-sequenced at this locus; the lowest level of sequence diversity (0.0001) within any locus-sample combination (Table 1). No nonsynonymous polymorphism was observed at this locus.

Tyrosine Kinase Binding Protein
Tyrosine kinase binding protein (TYROBP; also known as DAP12) is involved in the activation of Natural Killer (NK) cell anti-viral responses, via 'non-inhibitory' members of killer-cell inhibitory receptor (KIR) family [24][25][26][27]. NK cell KIRs have been shown to be rapidly evolving in a wide range of species, including cattle, and have undergone lineage-specific expansions or contractions of the gene family [28][29][30]. It is tempting to speculate that adaptive evolution of bovine TYROBP, may be driven by coevolution with rapidly evolving members of the KIRs, with which it interacts.
The results of the standard neutrality tests at this locus are not as clear as for the cytokines discussed previously (Table 1). Fu's Fs test [17] and Tajima's D are significantly negative in the European population, and Fay and Wu's H test [23] suggests an excess of derived SNPs in the African and European samples. Significance at the H test proved robust to a population expansion model but the D value fell within the 95% confidence interval.
TYROBP has two nonsynonymous SNPs within cattle, one only within the African sample (S12C) and one derived variant which is almost fixed within both the African and European samples (Q24L). Both of these occur in the signal peptide and are predicted by SIFT to have an impact on protein structure and/or function [18]. An additional nonsynonymous substitution (P106A), predicted by PolyPhen as 'possibly damaging', occurs between cattle and plains bison, and is located in the cytoplasmic domain of TYROBP.

ADP-ribosyltransferase 4
Human ADP-ribosyltransferase 4 (ART4) is an erythrocyte glycoprotein that has been identified as the Dombrock blood group [31]. The two major antigens of this blood group (Do a and Do b ) have significantly different frequencies in human populations and are associated with severe transfusion reactions [32]. Three other high incidence antigens (Gy a , Hy and Jo a ) are also carried on the Dombrock blood group molecule.
Bovine ART4 was found to have five nonsynonymous and six synonymous SNPs within cattle populations and one nonsynonymous and synonymous substitution compared to plains bison. More specifically, four of the five nonsynonymous changes (G5R, R41M, I130V, S213A) and all of the within-cattle synonymous changes occur only in the B. indicus sample. Indeed, the B. indicus population exhibits much higher nucleotide diversity with 33 segregating SNPs compared to just 6 within each of the African and European populations (Table 1).
Cattle do not appear to be polymorphic at the orthologous position to the human mutation (N265D) which is responsible for the Do a /Do b antigenicity. Bovine ART4 is, however, polymorphic at the position associated with the Jo a antigen in humans (human T117I; bovine I130V) [32]. The antigenicity associated with this polymorphism in humans suggests that mutations at this position may result in structural alteration of the molecule. Two of the other nonsynonymous changes (G5R, G126W) are also predicted to alter protein structure by SIFT [18]. The I130V and G5R mutations are found only in B. indicus cattle while G126W is found only in African samples.

Cluster of Differentiation 2
Antigen recognition by T cells is one of the crucial steps in controlling adaptive immune responses. Cluster of Differentiation 2 (CD2) is a T cell and natural killer cell surface protein that optimizes T cell activation through interaction with its counter receptor; CD58 in non-rodent mammals [33][34][35] and CD48 in rodents [36]. CD2 and other similar cell surface proteins have low affinity for their counter receptors [37], a fact that may be exploited by viruses which can evolve a higher affinity for these receptors in order to invade the host cell [38]. We have previously identified the specific codon sites that have undergone adaptive changes in CD2 in the bovine lineage and revealed that they are almost exclusively limited to the extracellular portion of the molecular structure [4]. We speculated that a pathogen or more specifically, a virus interaction could have driven the rapid evolution observed in a region of the extracellular domain not involved in CD58/48 binding [4].
Here we identified a significant Fay and Wu's H value within the European sample, indicating an excess of derived variants (P < 0.05; Table 1). This significance was robust to imposition of a domestication/post domestic expansion demographic model on test simulations; and using these as a null the African sample attained significance also. More strikingly, bovine CD2 displayed high coding polymorphism. Eleven nonsynonymous SNPs were present within the sampled populations, and there were an additional three nonsynonymous substitutions compared to the outgroup species, plains bison. Seven of the eleven within-cattle nonsynonymous SNPs and all of the outgroup nonsynonymous substitutions are located in the extracellular domain of CD2, a region which we have previously shown to be under adaptive evolution in cattle [4]. This genomic region (exon 3) displayed a clear peak in sequence diversity ( Figure 1). This is reinforced by a visual inspection of haplotypes; six of these nonsynonymous changes lie within 130 bp and segregate in two strikingly divergent exon 3 haplotypes which are common in all three populations.
The locations of the seven extracellular domain nonsynonymous SNPs occurring within cattle populations were plotted on the structure of human CD2 based on their orthologous positions. It was noted that these sites appeared to cluster together in a particular region of the extracellular domain in the C2-set immunoglobulin domain ( Figure 2). The median Cα distance between residues where nonsynonymous changes are observed within bovine populations was calculated as 11.12 Å. In comparison, the median distance between all residues in the structure human CD2 extracellular region was 24.47 Å. In simulations, there were only 16/1000 cases where the median distance of seven randomly selected residues was lower than that of the median distance between the cattle nonsynonymous changes, indicating that these residues are non-randomly clustered on the structure of CD2 (P = 0.016).
PolyPhen [19] predicts two substitutions, K180M and S189K, as "probably" and "possibly" damaging, respec-tively. The SIFT program [18] also predicts K180M, plus T156M, and V197M as likely to affect protein function. These variants each form part of the cluster haplotype of coding changes in exon 3 that are situated together on the surface of the extracellular domain of the protein.

Discussion
The close proximity and subsequent migrations of humans and animals, following domestication, caused fundamental changes in the epidemiology of many infectious diseases [2]. Different cattle populations have been exposed differently to a variety of pathogens, such as rinderpest in early Asian populations [39] or trypanosomasis in sub-Saharan African populations [40]. Screening the genomes of domesticated animals for signatures of selection, particularly in relevant immune genes, may help pinpoint those genes that have been most important in the evolution of differential disease susceptibilities.
We queried whether six genes which have signatures suggestive of positive selection between species may have experienced continued selective pressure at the population level [6,7,41]. Notably, these genes were also chosen on the basis of having been previously implicated in susceptibility to bacterial, parasitic or viral infections. We resequenced each in 39 individuals comprising three distinct continental cattle populations. Several test statistics within the compilation of locus-population combinations yield evidence that recent adaptive change may have imprinted patterns of diversity around these genes. There is varying (but incomplete) concordance across tests. This is unsurprising as the statistics examine different aspects of locus diversity and are sensitive to selective processes that vary, for example in intensity or time depth [42]. The results fall into two interesting categories: first, outlying test statistics of allele frequency spectra in cytokines that imply the imprint of adaptive change but which are not associated with appreciable protein sequence polymorphism; and secondly, receptors which display coding variation suggestive of allelic functional variation, but which give less significant summary statistic values.
Of the latter loci, TYROBP gives the strongest statistical departure from neutrality, and has a derived nonsynonymous mutation almost fixed within European and African B. taurus cattle with predicted impact. In humans, ART4 Sliding window analysis of per-site sequence diversity (π) within each population for resequenced regions of CD2 encodes the Dombrock blood group glycoprotein [31]. It has been speculated that the polymorphic nature of this molecule in humans may be driven by an evolutionary advantage due to mutation of the RGD motif that may influence adhesion of the malaria parasite to erythrocytes [31]. Cattle do not appear to be polymorphic at the position responsible for the Do a /Do b antigenicity in humans, but are polymorphic at the position associated with the Jo a antigen, suggesting a structural or functional impact [32]. However, of these three, the most interesting evidence arises with CD2.
CD2 is a natural killer-cell and T-cell surface molecule that stabilizes interactions with antigen-presenting cells [34]. not secure indications of recent adaptive evolution at this locus, but a closer look at the pattern of derived polymorphism underlying them gives several layers of additional support.
Firstly, CD2 was found to have a high number (eleven) of nonsynonymous changes within cattle (three between cattle and the outgroup species, plains bison). The majority of these are localized in the extracellular domain of the molecule, a region with a strong signal of positive selection (from inter-specific comparison), which may be driven by a host-pathogen genetic conflict [4]. Secondly, six of these changes are derived, located within a narrow window within exon 3 and define a divergent haplotype that appears at appreciable frequency in each of the three sampled populations. This region yields a signal of selection in sliding window analysis and has a peak of sequence diversity. Thirdly this genetically divergent haplotype is likely also to be functionally divergent. Three of these clustered substitutions are predicted to have phenotypic impact from biochemical and comparative information. Lastly, examination of a predicted 3D structure of the extracellular domain ( Figure 2) showed that the seven substitutions clustered spatially -a configuration confirmed to be unlikely to be random by bootstrapping.
Taken together, these observations provide population genetic inference of adaptive evolution at the CD2 locus, possibly driven by receptor-pathogen interactions. Note that several similar receptors to CD2 are known to be exploited by viruses in order to invade the host cell; for example, the CD2 family member, SLAM, is bound by the measles virus [38].
Within the three cytokines sequenced, the strongest population genetic signatures of selection occur at the closely linked IL5 and IL13 loci within the African B. taurus population sample. Fay and Wu's H (IL5) plus Tajima's D, Fu and Li's D and F (IL13) values all show highly significant departures from the neutral model. Interestingly, these two loci comprise part of a cytokine cluster on chromosome 7 (syntenic with human chromosome region 5q31-33) which has key involvement in the Th2 mediated immune response and consequently, susceptibility to extracellular parasitic infections. There are well described and considerable parasitic challenges particular to African livestock; for example, non-indigenous breeds continue to be excluded from parts of West Africa by endemic trypanosomiasis [10]. Both of these loci show some reduction in nucleotide diversity within Africa (as does IL2) relative to the other populations but a focus of positive selection within each region is not easily localized from the data. Also, each interleukin sequenced showed a dearth of nonsynonymous change. IL5 and IL13 (along with IL4) are known to be influenced by cis factors over a 120 Kbp region and the focus of selection may be a remote regulatory sequence.

Conclusion
This study alludes to two generalities. First, we show some correspondence between inference of selection from intraspecies variation and that from the million year-order comparison between species. This hints that the diversity that was a matter of life and death for bovine antecedents has also been important in more recent eras, and probably continues to matter today. Second, we see contrasting patterns of diversity between the three signaling molecules (IL2, IL5 and IL13), which have outlying allele frequency spectra but insubstantial nonsynonymous polymorphism and three cell-surface receptors (TYROBP, ART4 and CD2), which interact directly with the extracellular environment and were observed to have significant coding polymorphism, giving some suggestion of adaptive history. This difference accords with prior work that records an increase of protein coding sequence adaptive events at the cellular periphery in humans, with more constraint on change at those loci which are internal and more central in interacting networks [43].

Re-sequencing strategy
We

SNP detection and sequence analysis
Sequence assembly and SNP analysis was carried out using the Phred/Phrap/Consed/Polyphred pipeline http:/ /www.phrap.org/phredphrapconsed.html. Bases were called and sequences assembled into contigs using Phred v0.020425.c [44,45] and Phrap v0.990319. Default parameters were used except for the forcelevel flag, which was set to a maximum value of 10, reducing the stringency of matching required for final contig assembly, and allowing outgroup sequences to be incorporated in subsequent SNP detection. Polyphred v.5.0 was used for polymorphism detection [46,47]. Reads were compared to the UCSC genomic reference sequence, and the source flag was used to optimize heterozygote identification. SNPs were checked manually and verified by examining trace files in Consed [48]. Alignments of polymorphic positions for each gene are available (see Additional file 1). PHASE v.2.1 was used to reconstruct haplotypes [49]. Nucleotide positions with lower confidence in sequence quality, i.e. fewer than 20 individuals with high quality sequence were masked and excluded from further analysis.

Neutrality tests
We carried out several tests of neutrality based on allele frequency spectra derived from the re-sequencing data. These statistics were calculated over the entire resequenced regions (Table 1) using DnaSP [50,51]. The Tajima's D test [15] compared the difference between the number of segregating sites, S, and the average number of pairwise differences, π. Negative values of Tajima's D are due to an excess of low frequency variants, which may be due to positive selection or population expansion.
Fu and Li's D and F statistics incorporate information from a genealogy constructed with the outgroup sequence, plains bison. External branches of this genealogy tend to be populated by more recent mutations. Fu and Li's D is based on the differences between Se, the total number of mutations in external branches of the genealogy, and S, the total number of mutations, while, their F test statistic is based on the differences between Se, the total number of mutations in external branches of the genealogy, and π [16].
The H statistic, developed by Fay and Wu, requires an outgroup sequence, and is based on the difference between two estimators of θ : π and θ H , which is based on the frequency of the derived variants [23]. Fu's Fs [17] is based on the probability of observing a greater number of alleles in the population than expected under neutrality. Either genetic hitch-hiking or recent population expansion can cause a significant result in this test. For all of these tests, negative values result from an excess of low frequency and/or derived polymorphisms, while a deficiency of low frequency alleles causes a shift towards positive values.
One problem with neutrality test statistics can be their sensitivity to different demographies. Consequently, to test the robustness of our findings, we examined an alternative demographic history for whole gene coalescent simulations. These were used to construct empirical null distributions of two complementary test statistics (Tajima's D and Fay and Wu's H) presuming a constant population size of 10 4 until 10,000 years ago, followed by an instant reduction to 10 3 and an exponential increase to a present day size of 10 5 , a scenario consistent with the domestication and subsequent expansion of livestock. Simulations were carried out using MS [52].

Predictive polymorphism phenotyping
Predictive polymorphism phenotyping of nonsynonymous SNPs was carried out using the PolyPhen http:// genetics.bwh.harvard.edu/pph/ and Sorting Intolerant From Tolerant (SIFT) http://blocks.fhcrc.org/sift/ programs [18,19]. PolyPhen assesses a number of parameters including, amino acid physiochemical properties, the level of sequence conservation in homologous proteins, and the proximity to structural or functional domains, to predict functionally significant nonsynonymous SNPs. Similarly, the SIFT program uses an alignment of orthologous or paralogous sequences to assess the usual level of sequence conservation at a polymorphic site and thus the potential functional importance of a substitution.

Non-random Clustering of CD2 nonsynonymous SNPs
The distance between the alpha carbon atoms of each residue (Cα distance) in the 3 dimensional structure of the human CD2 extracellular domain (1 HNF.pdb) was calculated using the BioShell program http://bio comp.chem.uw.edu.pl/BioShell.

Authors' contributions
ARF and DJL contributed equally to this work, both carried out data analysis and co-wrote the manuscript. ARF re-sequenced the genes studied in this project. CM assisted in SNP calling and in data analysis. DGB conceived of and oversaw the study and assisted in writing the manuscript.