A genome-wide scan study identifies a single nucleotide substitution in the tyrosinase gene associated with white coat colour in a red deer (Cervus elaphus) population

Background Red deer with very pale coat colour are observed sporadically. In the red deer (Cervus elaphus) population of Reinhardswald in Germany, about 5% of animals have a white coat colour that is not associated with albinism. In order to facilitate the conservation of the animals, it should be determined whether and to what extent brown animals carry the white gene. For this purpose, samples of one white hind and her brown calf were available for whole genome sequencing to identify the single nucleotide polymorphism(s) responsible for the white phenotype. Subsequently, samples from 194 brown and 11 white animals were genotyped. Results Based on a list of colour genes of the International Federation of Pigment Cell Societies, a non-synonymous mutation with exchange of a glycine residue at position 291 of the tyrosinase protein by arginine was identified as the cause of dilution of the coat colour. A gene test led to exactly matching genotypes in all examined animals. The study showed that 14% of the brown animals carry the white gene. This provides a simple and reliable way of conservation for the white animals. However, results could not be transferred to another, unrelated red deer population with white animals. Although no brown animals with a white tyrosinase genotype were detected, the cause for the white colouring in this population was different. Conclusions A gene test for the conservation of white red deer is available for the population of the Reinhardswald. While mutations in the tyrosinase are commonly associated with oculocutaneous albinism type 1, the amino acid exchange at position 291 was found to be associated with coat colour dilution in Cervus elaphus.

In addition to colour inheritance in cattle [18], information is also available on sheep [19], goat [20] and buffalo [21]. However, nothing is known about colour inheritance in Cervids. Although so far only a few genes seem to be associated with the whitening of cattle, there is still a wide range of candidate genes to be considered in the search for the genetic cause of the whitening of red deer. White coat colour or dilution are extremely rare in red deer. In Germany there are two populations with white individuals, one in the Reinhardswald in the North of Hesse and one in Siegen-Wittgenstein in North Rhine-Westphalia. Within the approximately 1000 individuals of the red deer population of the Reinhardswald, about 50 white animals are suspected. Similar conditions exist in Siegen-Wittgenstein. It is important for the conservation of the white animals to identify the responsible gene variants and develop gene markers. This is the only way to make targeted statements about the distribution of the white gene variant in the population. However, up to now, nothing is known about the genes that are responsible for the white coat colour. The aim of the present work was therefore to first limit potential candidate genes by means of genome-wide single nucleotide polymorphism (SNP) analysis and then to identify the most highly white colour-associated SNPs. Following the hypothesis of a recessive inheritance of the white colour, we expected the genotype of the white hind to be homozygous for the white allele and the brown calf to be heterozygous. All genes and SNPs that did not correspond to this assumption were sorted out, resulting in 15 genes with 21 ns SNPs to be further examined (Table 2). For each of these SNPs a polymerase chain reaction (PCR) system was established to test the association of the gene variant with the phenotypes of a sample of white and brown individuals of the population ( Table 2). The SNP at the TYR gene was the only one with a 100% match between genotype and phenotype.
The sequence of the five exons of the red deer tyrosinase mRNA, spanning 1593 bases showed a genetic similarity with the sequences of the human and bovine tyrosinase of 86 and 97%, respectively.
Sequencing of the hind and her calf with the reference genome CerEla 1.0 resulted in a coverage of 9.58 and 10.05 fold, respectively. A total of 32.36 and 33.94 Gigabases mapped 92.0 and 92.0% of the whole genome,  The results were verified by sequencing the same two individuals using the later available genome sequence for Cervus elaphus (CerEla1.0). Nineteen of the 21 SNPs from 14 of the 15 candidate genes could be verified with CerEla1.0. One SNP in HPSA4 on Cervus elaphus chromosome (CEL) 5 and the SNP in the tyrosinase gene (CEL 2) could not be detected because of a gap in CerEla1.0 at heat shock protein family A (Hsp70) member 4 (HSPA4) and because the respective region of the tyrosinase gene was not yet annotated in CerEla1.0.
In the population of the Reinhardswald, there was no brown individual with genotype AA of TYR and none of the white phenotypes had genotype GG or GA. Thus, the inheritance of the white colour in the red deer of the Reinhardswald was established as autosomal recessive. The tyrosinase gene is located on Cervus elaphus chromosome (CEC) 2. The SNP c.871G > A in the tyrosinase gene is located within a highly conserved region and results in an amino acid substitution of Glycine by Arginine. From 194 brown red deer of the Reinhardswald 86% were homozygous and 14% were carriers of the white allele. Considering the estimation by the forest officers of the Reinhardswald of 50 white animals among the total population of around 1000 red deer (approximately 5%), genotype frequencies for GG, GA and AA were estimated as 81.7, 13.3 and 5%, respectively. Under this assumption, the allele frequencies are estimated to be 88.4% (G) and 11.6% (A), respectively. Thus, the estimated genotype frequencies deviate significantly from the Hardy-Weinberg equilibrium (p < 0.001).
The expected values are 78.1% (GG), 20.5% (GA) and 1.3% (AA), respectively. There was no obvious phenotypic difference between carriers of the GG and the GA phenotype.
C.871G > A was not associated with red and brown coat colour in the unrelated German red deer population Siegen-Wittgenstein. However, the TYR-genotype AA was never detected in a brown individual regardless of its origin.

Discussion
Because a Cervus elaphus reference genome was not available at the time of sequencing, sequence reads of the red deer were aligned to the reference sequence of the bovine genome (UMD 3.1). After CerEla1.0, the complete genome sequence of the red deer was published [22], the sequences of the hind and her calf were realigned to CerEla1.0 as the reference sequence. With the use of CerEla1.0 versus UMD 3.1, 92% instead of 82% of the genome of the hind and the calf could be mapped. At the same time, the number of SNPs between calf and mother increased by about 10%. As expected, sequencing based on Cervus elaphus sequences proved to be superior than sequencing on the basis of Bos taurus sequences.
However, since the TYR gene was not annotated in CerEla1.0 the responsible SNP for the white phenotype in the Reinhardswald red deer population had no chance to be detected. This is not unexpected, since 21,880 genes are annotated for the bovine genome in contrast to 19,368 for the genome of Cervus elaphus. Nevertheless, the high degree of agreement even of microsatellite sequences, between red deer and other ungulates, especially cattle [23,24] justified the use of the bovine genome as reference sequence. Indeed, red deer sequences homologous to 82% of the bovine genome were mapped, including 9.9*10 6 SNPs. We were confident that the coding sequence ranges in particular would show a good match between red deer and bovine genome. In fact, 8570 SNPs were extracted after variant calling as a subset based on a list of colour genes (International Federation of Pigment Cell Societies). Twenty one SNPs in 15 candidate genes corresponded exactly to the requirements of a homozygous white hind and its heterozygous brown calf. However, only one SNP, located in the TYR gene matched exactly to the total sample with 194 brown and 11 white animals of the Reinhardswald population. The probability of a random match between genotype and phenotype (0.5 205 ) in this number of animals corresponds to 1.94*10 − 62 . Although the exact number of white individuals is not known, the responsible forestry authority assumes about 50 white animals, within a total population of about 1000 red deer. Using the prevalence of heterozygous brown red deer, this results in a significant deviation from the Hardy-Weinberg equilibrium with too high a proportion of homozygous white genotypes. This could be explained by the fact that no white red deer had ever been shot up to the time of the study (selection). The mixed-bred, brown animals, on the other hand, were hunted without any difference to the clean-brown red deer. Factors that could have led to the preferred reduction of white individuals, such as predators (e.g. wolf or lynx), were not present in the region studied. The selection for white red deer results in particular from the fact that the reference to its existence is used as a unique selling point and tourist advertising object for the region. In this context, citizens' initiatives have repeatedly been campaigning for the preservation of white individuals.
Since white animals were also sporadically the victims of traffic accidents, it was an important question to investigate whether the 50 estimated individuals were left to their own or whether they could be regarded as an integrated part of the total population. The present study showed with the proof of the heterozygous brown individuals that the white allele is deeply anchored in the population and that statistically, one to two new white calves can be expected from the mating of heterozygous brown animals per year.
Tyrosinase is the key enzyme in the synthesis of melanin. It catalyses the rate-limiting step, the hydroxylation of the amino acid tyrosine to dopaquinone [25] and subsequently the oxidation of 5,6-dihydroxyindole (DHI) to indole-5,6-quinone [3]. Hundreds of mutations in the tyrosinase gene including missense, nonsense, frameshift, splice site mutations and a deletion of the entire coding sequence have been identified and associated with oculocutaneous albinism type I (OCA1 [26]; http://www.ifpcs. org/albinism/). This is an autosomal recessive disorder, associated in most cases with severe hypopigmentation of the skin, hair and eyes, most often accompanied by nystagmus, foveal hypoplasia and reduced visual acuity [26]. Only few polymorphisms in the coding region of the gene have been described [27]. Besides human and mouse, TYR mutations associated with albinism have been found in rabbits [28], cats [29], rats [30], ferrets [31], minks [32], donkeys [33], humpback whale [34] and cattle [11].
In addition to the extensive cases of albinism, mutations in mice have also been described in connection with coat dilution, particularly in connection with pheomelanin [35][36][37]. However, pheomelanin coat colour dilution in French cattle breeds could not be correlated with tyrosinase [1]. Colour variants of the Bactrian camel [38] and dilution in the coat colour of Alpaca [39] could not be associated with mutations in the TYR gene.
White deer is found only sporadically. We only know of one single reference that deals with microsatellite analysis for the control of inbreeding and genetic diversity in a population of white red deer in Czech Republic [40]. The causes for the colour of the white coat in this species are completely unknown. The coat colour of the white individuals is diluted, but they are not albinos. The eyes are pigmented. The polymorphism that is responsible for the dilution led to an amino acid exchange at position 291, where the amino acid glycine is found in humans, cattle and red deer. Mutations in humans are not known. Amino acid 291 lies outside known functional areas of the tyrosinase protein. In animals with a white coat, glycine was replaced by arginine. Arginine is basic, positively charged and hydrophilic. Glycine is an uncharged, apolar and hydrophobic amino acid. Although PANTHER14.1 (http://pantherdb.org/tools/ csnpScoreForm.jsp) predicted this amino acid exchange as benign, this chemical difference may alter the efficacy of tyrosinase without a complete failure. Vitkup et al. [41] and Khan and Vihinen [42] concluded that mutations at arginine and glycine residues are together responsible for about 25 to 30% of genetic diseases. The same mutation has been described in a white Korean Hanwoo cattle (gene bank AccNo YQ513971). Unfortunately, a detailed phenotype of the cattle is not available. Thus it is not clear, whether the cattle suffer from complete OCA1 or just a dilution of the coat colour.
The extension of the study to a second, unrelated red deer population revealed no brown carriers of the AA variant; however, white animals without the AA genotype at position 291 of the tyrosinase protein were found. This indicates that in this population (Siegerland-Wittgenstein) another, unknown gene variant segregates, which leads to dilution of the coat colour. Thus, although the tyrosinase mutation is responsible for the white colouring of the deer of the Reinhardswald, other previously unknown mutations are to be expected in other populations of white red deer.
In addition to the result of anchoring the white individuals of the Reinhardswald in the brown red deer population, the study can also serve to document dispersal paths and migration movements to neighbouring red deer areas and to distinguish red deer populations with white individuals from each other. For this purpose, more red deer populations need to be tested for the presence of the c.871G > A tyrosinase gene variant. The gene test can also be used to investigate the influence of the tyrosinase gene variant on physical development, fertility and adaptability within the segregating population. It is anecdotically assumed that the white deer of the Reinhardswald were imported from southeastern Europe in the sixteenth century, scattered throughout the region in the turmoil of the Thirty Years' War in Europe and have survived to this day. By screening different Southeast European red deer populations, it could be possible to decipher the origin of the white red deer of the Reinhardswald in future studies. Furthermore, the results show an enormous potential for the use of well-established reference genomes of closely related species for genomic analyses (especially at gene level) in species for which no reference genome is yet available.

Conclusion
The identification of the gene variant responsible for the white coloration and the quantification of heterozygous animals provided evidence that the few white animals are not an independent population. Rather, the white allele is widespread throughout the entire population via the heterozygous, brown animals.

Red deer population
The Reinhardswald is a part of the Weserbergland, one of the biggest coherent forest areas of Germany and is located in the North of the Federal State of Hesse (51°3 0′ N, 9°34'O). The forest covers an area of 183 km 2 and, according to the Reinhardswald red deer association, has a census size of about 1000 animals of which about 50 animals are white.

The phenotype
The white deer of the Reinhardswald are not albinos. The coat colour is very pale, stronger in summer than in winter. The dilution is qualitatively distinguishable by eye. Eyes and claws are normally pigmented or slightly lightened. Apart from the coat and eye colour, the white animals do not differ from the brown of the population in height, weight and habitus (Fig. 1). However, there is no detailed information on the phenotype (histology, physiology, biochemistry) available.

Sample collection
During hunting seasons 2013 to 2015 tissue samples from brown (n = 194) and white (n = 3) red deer and samples from antlers of white red deer (n = 8) were collected. For sequencing, samples of two female animals (one white adult hind with its brown calf) were available. Samples were taken from existing antlers and frozen tissue samples provided by those authorised to practise hunting. No animals were killed specifically for the study. No live animals were sampled and no dropping antlers were sought or collected for the study. All samples were accompanied with information about age, weight, colour, and hunting ground. In addition, the presence/absence of white animals in the deer pack from which a sample was taken was recorded.
Further samples from brown (n = 21) and white (n = 9) red deer were collected in exactly the same way in Siegen-Wittgenstein, another area with brown and white animals. Reinhardswald and Siegen-Wittgenstein are separated by 110 km, a fenced motorway, several country roads and a red deer-free area. Both populations were not related or linked with each other as shown by a population differentiation test implemented in Genepop (see below).
Samples from antlers were taken as drill core samples from the base and stored dry at ambient temperature. Tissue samples were frozen at − 20°C until use.

DNA extraction
Genomic DNA was extracted from tissue samples and antler drill cores with the Instant Virus RNA Kit (Analytik Jena, Germany). This kit was thoroughly tested against DNA extraction kits and its ease of use and its efficiency in extracting DNA was found to be comparable or even superior. Antler drill cores (0.1 to 0.3 g) were treated in a beadmill (MM200, Retsch, Germany) at a frequency of 25 Hz for 2 min. Tissue samples were suspended in 450 μl of lysis buffer and subsequently treated in the same way as the antler drill cores. All following steps were performed according to the manufacturer's instructions. The extracted DNA was eluted with 60 μl of RNAse-free water.
DNA concentration was measured photometrically with the Nanodrop 2000 spectrophotometer (Thermofisher, USA) and the Qubit 2 system (Qubit dsDNA br assay kit and Qubit dsDNA hs assay kit, Thermofisher, USA).

DNA quality control and next generation sequencing
The DNA of the hind and calf were provided for genomic sequencing. The amount of DNA was quantified through qPCR with the Kapa Library Quantification Kit (Kapabiosystems, USA) and diluted to 20-30 ng/μl for library preparation (TruSeq DNA PCR-free sample preparation Kit, Illumina, USA). Fragment sizes of the libraries were visualized with a BioAnalyzer 2100 (Agilent Genomics, USA).
Quality controlled libraries were sequenced using the HiSeq 2500 instrument (Illumina, USA). Paired-end libraries (2 × 126 bp reads) were sequenced with a mean coverage of ten times.
Prior to further processing raw data were quality checked for overrepresented and duplicate sequences with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Raw sequences were then converted from a base call file (bcl) to fastq files and mixed probes were demultiplexed through the program bcl2fastq Conversion Software from Illumina (http://emea.support.illumina.com/downloads/ Fig. 1 The hind shows a slightly stronger brightening than the stag. The eyes are clearly pigmented with both animals (a). Comparison between a normal brown hind and a hind with white coat colour (b) bcl2fastq_conversion_software_184.html?langsel=/de/). Because a Cervus elaphus reference genome was not available at the beginning of the study, resulting reads were at first aligned to the reference sequence of the bovine genome (UMD 3.1 [43]) and in a second step to the Cervus elaphus reference sequence CerEla1.0, both using the BWA-MEM algorithm (https://arxiv.org/abs/1303.3997). After processing of data, single files were merged and converted from the SAM to the BAM format with SAMtools [44]. Duplicated reads were marked by the PICARDtools command MarkDuplicates (https://github.com/broadinstitute/picard/).

Variant calling, annotation and identification of candidate variants
To identify single nucleotide polymorphisms (SNPs) as well as short insertion and deletion polymorphisms (INDELs) in the annotated reads of the two sequenced red deer samples, we used the mpileup algorhithm implemented in SAMtools [44]. With the filter algorithm from PICARDtools (https://github.com/broadinstitute/ picard/) called variants were filtered by excluding all SNPs within 3 basepairs of an INDEL and with lower QUAL score, and by excluding INDELs within 2 basepairs of another INDEL.
For the functional annotation of each called SNP we adapted the VariantEffectPredictor (VEP) from Ensemble [45].
Furthermore, we extracted a subset of SNPs based on a list of colour genes detected in mice, human and zebrafish (International Federation of Pigment Cell Societies; http://www.ifpcs.org/albinism/). Resulting VEP annotated files containing only genomic regions coding for coat colour were checked on the basis of a recessive genetic inheritance model for non-synonymous impacts of the mutations.

Validation of candidate SNPs
SNPs were selected in a hierarchical procedure as candidate SNPs for further processing. First and foremost, they had to be within the range of colour genes specified by the International Federation of Pigment Cell Societies. The second prerequisite was that the SNP was non-synonymous. The SNP had to be homozygous for the hind and heterozygous for the calf. The responding 21 candidate-SNPs (15 different genes) were validated by Sanger sequencing (ABI 3500 genomic analyzer). For this purpose, regions including the candidate SNPs were PCR amplified and sequenced. PCR primers were designed (https://primer3plus.com/cgi-bin/dev/primer3 plus.cgi) from the NGS data in combination with data from the Bos taurus reference genome (UMD 3.1). Later the SNPs were verified with CerEla1.0, the Cervus elaphus reference genome.

Pyrosequencing
Genotypes of animals were detected by pyrosequencing on a Pyromark Q96 ID system (Qiagen, Germany) and sequences were analyzed with the Pyro-Mark ID 1.0 Software (Qiagen, Germany).
PCR was performed in a total volume of 40 μl consisting of 20 μl of Multiplex Mastermix (Qiagen, Germany), 4 μl of primer mix (HW-TYRF 5′-TTTCCAGGATTGCG CAGTA-3', HW-TYRR 5'-TGCAGCAGATTGGAGGAG TAC-3') with a final concentration of 0.4 μM, 12 μl of water and 4 μl of template DNA. Cycling conditions were as follows: initial activation of the DNA polymerase for 15 min at 95°C, followed by 35 cycles of denaturation at 94°C for 30 seconds, annealing at 52°C for 90 seconds and extension at 72°C for 30 seconds, followed by final extension at 72°C for 10 min. Quality and amount of PCR products was checked by electrophoresis on 1.5% agarose gels stained with Midori Green Advance (Biozym, Germany). PCR products immobilized to streptavidin-sepharose beads were released in 40 μl of 5 μM sequencing primer (HW-TYRS 5'-ATGGTCCCTCAGACG-3′) and subjected to pyrosequencing.

Population genetic analysis
To test the effect of the white gene in another population red deer from Siegen-Wittgenstein (21 brown and 9 white animals) were included. Phenotypically, no differences could be found between red deer originating from Reinhardswald and Siegen-Wittgenstein. Population genetic analysis using microsatellites [46] was conducted to verify the independence of the two populations. The population differentiation test [47] implemented in Genepop (https:// kimura.univ-montp2.fr/~rousset/Genepop.htm) was performed as an exact G test with the following Markov chain parameters: dememorisation length of 100,000 and 100 batches with 10,000 iterations per batch.