Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds

Zhang, Qianqian; Calus, Mario PL; Guldbrandtsen, Bernt; Lund, Mogens S; Sahana, Goutam

doi:10.1186/s12863-015-0227-7

Research article
Open access
Published: 22 July 2015

Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds

Qianqian Zhang^1,2,
Mario PL Calus²,
Bernt Guldbrandtsen¹,
Mogens S Lund¹ &
…
Goutam Sahana¹

BMC Genetics volume 16, Article number: 88 (2015) Cite this article

9471 Accesses
120 Citations
1 Altmetric
Metrics details

Abstract

Background

Levels of inbreeding in cattle populations have increased in the past due to the use of a limited number of bulls for artificial insemination. High levels of inbreeding lead to reduced genetic diversity and inbreeding depression. Various estimators based on different sources, e.g., pedigree or genomic data, have been used to estimate inbreeding coefficients in cattle populations. However, the comparative advantage of using full sequence data to assess inbreeding is unknown. We used pedigree and genomic data at different densities from 50k to full sequence variants to compare how different methods performed for the estimation of inbreeding levels in three different cattle breeds.

Results

Five different estimates for inbreeding were calculated and compared in this study: pedigree based inbreeding coefficient (F_PED); run of homozygosity (ROH)-based inbreeding coefficients (F_ROH); genomic relationship matrix (GRM)-based inbreeding coefficients (F_GRM); inbreeding coefficients based on excess of homozygosity (F_HOM) and correlation of uniting gametes (F_UNI). Estimates using ROH provided the direct estimated levels of autozygosity in the current populations and are free effects of allele frequencies and incomplete pedigrees which may increase in inaccuracy in estimation of inbreeding. The highest correlations were observed between F_ROH estimated from the full sequence variants and the F_ROH estimated from 50k SNP (single nucleotide polymorphism) genotypes. The estimator based on the correlation between uniting gametes (F_UNI) using full genome sequences was also strongly correlated with F_ROH detected from sequence data.

Conclusions

Estimates based on ROH directly reflected levels of homozygosity and were not influenced by allele frequencies, unlike the three other estimates evaluated (F_GRM, F_HOM and F_UNI), which depended on estimated allele frequencies. F_PED suffered from limited pedigree depth. Marker density affects ROH estimation. Detecting ROH based on 50k chip data was observed to give estimates similar to ROH from sequence data. In the absence of full sequence data ROH based on 50k can be used to access homozygosity levels in individuals. However, genotypes denser than 50k are required to accurately detect short ROH that are most likely identical by descent (IBD).

Background

The definition of inbreeding coefficient (F) is the probability that two alleles in an individual are identical by descent (IBD) relative to a base population where all alleles are assumed unrelated [1]. Rates of inbreeding have increased as intensive selection was applied to the populations [2–7]. Increased levels of inbreeding result in increased probability that animals are homozygous for deleterious alleles [2, 8, 9]. Thus, inbred animals suffer from inbreeding depression with reduced fitness, and highly inbred animals may have considerably reduced lifespans [2, 6, 10–13]. Information on inbreeding is critical in the design of breeding program to control the increase in inbreeding levels and thereby controlling inbreeding depression in the progeny. Pedigree information has been used to calculate the estimated inbreeding coefficient as the expected probability that two alleles at a locus are IBD [14–16]. For example, Meuwissen and Luo proposed a method to estimate inbreeding coefficients based on pedigree data of large populations [17]. However, incomplete pedigrees result in erroneous estimates and an underestimation of levels of inbreeding [18]. VanRaden proposed a method to take into account unknown ancestors when estimating inbreeding coefficients, increasing the accuracy of inbreeding level estimates in incomplete pedigrees [19].

With the availability of Single Nucleotide Polymorphism (SNP) array genotyping technologies, long stretches of homozygous genotypes, known as runs of homozygosity (ROH) can be identified. ROH are believed to reflect an estimate of autozygosity on genomic level and generally identify genomic regions which are IBD [20, 21]. Theoretically, it is expected that ROH can be accurately estimated from the full sequence data, because these estimates do not suffer from sampling such as may be expected when subsets of loci, for instance 50k SNPs, are used [22–24]. The inbreeding coefficient can be calculated as the proportion of genome covered by ROH and has been shown to be more informative than the inbreeding coefficient estimated from pedigree data or other estimators because ROH strongly correlate with homozygous mutation load [25]. ROH have commonly been used to infer population history and to examine the effect of deleterious homozygotes caused by inbreeding in human populations [20, 26–29]. Long ROH reflect recent inbreeding, whereas short ROH reflect ancient inbreeding [26]. However, only a few studies have evaluated ROH in cattle populations. Ferenčaković et al. examined the effect of SNP density and genotyping errors when estimating autozygosity from high-throughput genomic data [24]. Estimates based on ROH also vary with different densities of genomic data. The minimum length of ROH that can be detected depends on SNP density [24, 30]. Recently, Purfield et al. detected ROH in a cattle population from SNP chip data to infer population history [31]. However, to estimate the “true” state of ROH, whole-genome sequences should be used rather than SNP chip data, but, to date, there are only few studies doing this in cattle [32]. With the advent of next-generation sequencing technology, whole-genome sequences have become available to examine the fine-scale genetic architecture of the cattle genome. It is now possible to investigate and compare how well different commonly used estimators of inbreeding level correlate with ROH estimated using next-generation sequence (NGS) data.

In recent years, widespread availability of genotype data enabled computation of inbreeding from the diagonals of genomic relationship matrices, i.e., the “GRM” method (F_GRM), as a by-product of genomic selection. Similarly, using the genotypes, the inbreeding coefficient can be computed based on excess of homozygosity following Wright (1948) (F_HOM) [33] and based on correlation between uniting gametes following Wright (1922) (F_UNI) [1]. The objective of the present study was to compare different estimators for inbreeding coefficients calculated from pedigree, 50k SNP chip genotypes and full sequence data with estimates based on ROH, for three different dairy cattle breeds.

Methods

SNP genotyping and sequencing

A total of 89 bulls with a high genetic contribution to current Danish dairy cattle populations were selected for whole-genome resequencing. These included 32 Holstein (HOL), 27 Jersey (JER), and 30 Danish Red Cattle (RDC) bulls. RDC cattle are a composite breed with contributions from different red breeds, including Swedish Red, Finnish Ayrshire, and Brown Swiss [34]. Only bi-allelic variants SNPs with a phred-scaled quality score [35] higher than 100 were kept for analysis to ensure the quality of variants. Genotypes were extracted from whole-genome sequence (WGS) data using GATK [36] and a perl script. The sequence variants with read depth lower than 7 or higher than 30 were filtered out. In addition, 85 of the sequenced animals were genotyped with the Illumina 50k SNP assay (BovineSNP50 BeadChip version 1 or 2, Illumina, San Diego, CA). SNP genotyping and quality control were as described by Höglund et al. [37]. Among the whole genome sequenced animals, 4 animals were not genotyped with the 50k SNP chip. Their genotypes for the SNPs on the 50k chip were extracted from their whole-genome sequences. The quality of genotype calls from SNP chips is expected to be higher than that of whole-genome sequences; therefore, only sequence variants with a high quality score (phred score > 100) were included. The corresponding corrections for reverse strand calls in the sequence data were converted to Illumina calls by correcting locus calling from reverse strands in Illumina calls to maintain consistency of allele encoding between Illumina calls and sequence data. The concordance between the SNP chip and sequence data was ~97 %.

Estimation of inbreeding

Using pedigree records (F_PED)

Inbreeding coefficients for the 89 bulls were estimated using pedigree records (F_PED). The average pedigree depth was ~8 generations ranging from 3 to 13. Average pedigree depth was 7, 8 and 9 for HOL, JER, and RDC, respectively. The method proposed by VanRaden [19] was used to compute inbreeding coefficients, which replaces unknown inbreeding coefficients by average inbreeding coefficients in the same generations. Inbreeding coefficients were calculated using the following formula [38]:

$$ {A}_{ii}={\displaystyle \sum_{j=1}^i}{L}_{ij}^2{D}_{jj}, $$

where $ {A}_{ii} $ is the i ^th diagonal element of the A matrix (pedigree relationship matrix), which is equal to the inbreeding coefficient of the i ^th animal plus 1. L is a lower triangular matrix containing the fraction of the genes that animals derive from their ancestors, and D is a diagonal matrix containing the within family additive genetic variances of animals [17]. The computation for matrix elements L _ij and D _jj follows the rule of computation of the A matrix [17]. The detailed decomposition for computing $ {A}_{ii} $ is explained by Meuwissen and Luo [17]. The analysis was conducted using Relax2 software [39].

Using genotypes (F_ROH, F_GRM, F_HOM, F_UNI)

Sequence data

ROH were detected from sequence data using all bi-allelic variants according to the method of Bosse et al. [23]. This method was used to compute ROH for sequence data instead of PLINK because not all short ROH can be detected using PLINK for sequence data (the sliding window size in PLINK is fixed; therefore, ROH shorter than a certain length cannot be detected). The measure of homozygosity based on ROH (F_ROH) from genomic data is defined as the total length of genome covered by ROH divided by the overall length of genome covered by SNPs or sequences as follows [20]:

$$ {\mathtt{F}}_{\mathtt{ROH}}=\frac{{\mathtt{L}}_{\mathtt{ROH}}}{{\mathtt{L}}_{\mathtt{AUTO}}}, $$

where L_ROH is the sum of ROH lengths and L_AUTO is the total length of autosomes covered by reads. The inbreeding coefficient was calculated by extracting ROH from sequence data. Three ROH estimates based on lengths were calculated from sequence data. The ROH was calculated separately by summing the ROH in different length classes: 1) based on all ROH; 2) ROH >1 Mbp; 3) ROH >3 Mbp.

In addition, three other estimates of inbreeding coefficients were calculated using sequence data (F_GRM, F_HOM, F_UNI). The F_GRM estimate was calculated following VanRaden (2008) [40] based on the variance of the additive genotypes. F_GRM was derived from

$$ {\mathrm{F}}_{\mathrm{GRM}}=\frac{{\left[{x}_i-E\left({x}_i\right)\right]}^2}{h_i}-1=\frac{{\left({x}_i-2{\widehat{p}}_i\right)}^2}{h_i}-1, $$

where p _i is the observed fraction of the first allele at locus i, h _i = 2p _i(1 − p _i) and x _i is the number of copies of the reference allele (i.e., the allele whose homozygous genotype was coded as “0”) for the i^th SNP [41]. This was equivalent to estimating an individual’s relationship to itself (diagonal of the SNP-derived GRM). The F_HOM estimate was calculated based on the excess of homozygosity following Wright (1948) [33]:

$$ {\mathrm{F}}_{\mathrm{HOM}} = \left[\mathrm{O}\left(\# \hom \right)\hbox{-} \mathrm{E}\left(\# \hom \right)\right]/\left[1\hbox{-} \mathrm{E}\left(\# \hom \right)\right]=1-\frac{x_i\left(2-{x}_i\right)}{h_i}, $$

where O (# hom) and E (# hom) are the observed and expected numbers of homozygous genotypes in the sample, respectively [41]. The F_UNI estimate was calculated based on the correlation between uniting gametes following Wright (1922) [1]:

$$ {\mathrm{F}}_{\mathrm{UNI}}=\frac{x_i^2-\left(1+2{p}_i\right){x}_i+2{p}_i^2}{h_i}, $$

where h _i and x _i are the same as for F_GRM [41]. The calculations for these three estimates F_GRM, F_HOM and F_UNI were computed using the option –ibc from GCTA software [41].

50k SNP chip

ROH were detected from 50k SNP chip data using the software PLINK with adjusted parameters (–homozyg-density 1000, –homozyg-window-het 1, –homozyg-kb 10, –homozyg-window-snp 20) [23, 42]. These settings for PLINK to detect ROH in SNP data were chosen to make the detected ROH in SNP chip data and sequence data as similar as possible to enable comparisons of results when using different types of data. Genomic estimates of the inbreeding coefficient based on all ROH (F_ROH) were calculated using the same formula as was used for the sequence data. The other three types of estimates (F_GRM, F_HOM, F_UNI) were also calculated for genotypes extracted from 50k SNP chip data using the same methods as for sequence data.

Pearson’s correlation coefficients were calculated between estimates of inbreeding coefficients from each of pedigree records, 50k SNP genotypes, and whole-genome sequence variants. All correlations between different inbreeding coefficient estimators were tested within breed to determine whether they were significantly different from 0 using the R (http://www.r-project.org/) cor and cor.test functions.

Impact of allele frequencies on estimators of inbreeding

As some estimators explicitly use allele frequencies to compute inbreeding coefficients, it is important to investigate how varying allele frequencies affect estimated inbreeding coefficients. Here, we investigated how the three different estimators change across the whole range of allele frequencies. For each genotype x _i (homozygous for the reference allele; heterozygous for the reference and non-reference allele; homozygous for the non-reference allele), the values can be written as a function of allele frequency p _i, as shown in Table 1.

Table 1 Formula for calculating three estimators (F_GRM, F_HOM and F_UNI) for each genotype (homozygous for reference allele; heterozygous for reference and non-reference allele; homozygous for non-reference allele)

Full size table

Results

We used five different approaches (F_PED, F_GRM, F_HOM, F_UNI, F_ROH) to estimate inbreeding coefficients using information from three different sources: pedigree, whole genome sequence and 50k SNP chip genotype data. There were total 11 estimates of inbreeding coefficients for each animal (Table 2). The average inbreeding coefficients estimated using different approaches and different data sets are presented in Table 2. The F_PED and F_ROH estimated from 50k data for HOL and JER are significantly higher than for RDC (p < 0.05). For inbreeding coefficients estimated from sequence data, F_ROH, F_ROH>1Mb, F_ROH>3Mb, F_HOM and F_UNI differed significantly among breeds, being highest in JER and lowest in RDC. The mean F_ROH for 50k SNP chip data (0.066), and sequence data (0.19) are significantly higher than F_PED (0.016) (p < 0.01).

Table 2 Estimated mean (min-max) of pedigree-based inbreeding coefficient (F_PED), GRM-based inbreeding coefficient (F_GRM), inbreeding coefficients based on excess of homozygosity (F_HOM), inbreeding coefficients based on correlation between uniting gametes (F_UNI), ROH-based inbreeding coefficients (F_ROH). F_ROH greater than 1 Mb, 3 Mb derived from sequence data were reported

Full size table

F_ROH estimated from sequence data is a direct and accurate estimate of the levels of homozygosity. It mostly reflects regions which were IBD on the genome; therefore, we limited our comparisons to comparing between F_ROH from sequence data with other estimates of F. High correlations were observed between F_ROH estimated from the 50k and sequence data with F_ROH>1Mb and F_ROH>3Mb from the sequence data for all three breeds (Tables 3, 4 and 5). The correlation between F_ROH estimated from 50k data and F_ROH>3Mb was higher than F_ROH estimated from 50k data and F_ROH>1Mb in JER and RDC (Tables 4 and 5). F_ROH was consistently positively correlated with F_HOM and F_UNI, when both were computed from either 50k or sequence data in all three breeds (Tables 3, 4 and 5). A high correlation was found between F_ROH and F_UNI, when both were computed from either 50k or sequence data in all three breeds (Tables 3, 4 and 5). However, for different breeds, F_HOM and F_UNI were correlated differently across different densities of genomic data. For HOL and RDC, the higher the density of genomic data used for F_UNI, the higher the correlation was between F_UNI and F_ROH from sequence data (Tables 3 and 5). For HOL, the correlation between F_UNI and F_ROH from sequence data (0.95) was still higher than the correlation between F_ROH estimated from 50k SNP chip data and sequence data (0.87) (Table 3). In contrast to JER, F_HOM and F_UNI were most highly correlated with F_ROH estimated from sequence data (Table 5).

Table 3 Correlation coefficients between different estimates for inbreeding from different data sets for HOL

Full size table

Table 4 Correlation coefficients between different estimates for inbreeding from different data sets for JER

Full size table

Table 5 Correlation coefficients between different estimates for inbreeding from different data sets for RDC

Full size table

F_PED was mostly intermediately correlated with F_HOM and F_ROH estimated from 50k and sequence data. The highest correlation between F_PED and F_ROH estimated from 50k and sequence data was found in HOL (Table 3). The strongest correlation among estimators of F_ROH (F_ROH from 50k or sequence data or F_ROH>3Mb or F_ROH>1Mb from sequence data) and F_PED was observed between F_PED and F_ROH>3Mb from sequence data in HOL (Table 3). A moderate correlation was found between F_PED and F_ROH estimated from 50k and sequence data for JER and RDC (Tables 4 and 5).

The estimate F_GRM from both 50k and sequence data and F_PED had a correlation close to zero in all three breeds and the values were often negative (Tables 3, 4 and 5). At the same time, F_GRM estimated from 50k and sequence data generally showed a low correlation with other estimates except between two estimates F_GRM estimated from 50k and sequence data in HOL and JER, and between F_GRM and F_UNI estimated from 50k data (Tables 3 and 4).

Discussion

Pedigree has been used to estimate inbreeding coefficients in animal breeding for over 50 years [1, 17]. Recently, researchers have utilized runs of homozygosity (ROH) estimated from medium density genotype data such as 50k SNP chip data to estimate inbreeding coefficients in livestock populations [22–24, 30]. ROH were initially used to explore regions of inbreeding in the genome and further investigate the fitness effect of these regions on different traits [2, 9, 11, 43]. Population subdivision and either inbreeding or inbreeding avoidance affects the whole genome composition, whereas selection and assortative mating will affect only those loci associated with particular phenotypes. However, we observed that inbreeding coefficient F_ROH estimated from sequence data were relatively higher for chromosome 1 and 10 for all four breeds (Fig. 1). This is most likely because the local recombination rate is relatively lower than average, which results in high levels of homozygosity on average [23, 44].

Our study is the first to calculate inbreeding coefficient based on ROH from full sequence data in cattle. The objective of this study was to compare estimates of inbreeding calculated from different methods and different data sources (pedigree, 50k SNP chip genotypes and full sequence data).

The pedigree-based inbreeding coefficient, F_PED, was moderately correlated with F_HOM and F_ROH in all breeds. These moderate correlations (~0.47 to 0.56) may be partly explained by the relatively shallow depth of the pedigree records (~8–9) for these bulls. Another difference between F_ROH and F_PED is that short ROH capture ancient inbreeding while long ROH capture recent inbreeding whereas pedigree captures only relatively recent inbreeding. Pedigree accounts only for inbreeding that occurred since pedigree recording began. Therefore, after excluding ROH smaller than 1 or 3 Mbp, the correlation between F_PED and F_ROH from sequence data increased slightly for all breeds. We should also point out that a very long stretch of homozygosity using marker data might not actually be completely homozygous and therefore, higher density data was suggested to be used to detect selective sweeps through runs of homozygosity [45]. Sørensen et al. [7] has estimated inbreeding in Danish Dairy Cattle Breeds and our estimates F_PED are lower than theirs. This is because our sampled animals for sequencing are founder and older animals compare to the other study where they used all animals [7].

Estimates of inbreeding coefficients differed with methods. Inbreeding coefficients estimates from methods using allele frequencies, i.e., F_GRM, F_HOM and F_UNI, showed considerable variation across data type and breeds. These estimators were sensitive to allele frequencies compared to ROH estimators, especially for populations with divergent allele frequencies (e.g., Fig. 2; RDC population). The estimates of genomic inbreeding coefficients are dependent on the allele frequencies in the base population [40].

In order to explore the reasons about the various correlations between inbreeding coefficients estimates using allele frequencies, F_GRM, F_HOM and F_UNI were plotted against the allele frequency changing from 0 to 1 when the number of copies of reference alleles for i^th SNP is 0, 1 or 2 (Figs. 3, 4 and 5). When a locus is homozygous for either the reference alleles or the non-reference alleles with the allele frequency ranging from 0 to 1, F_GRM ranged from -1 to infinity, F_HOM has a constant value of 1 and F_UNI ranged from 0 to infinity (Figs. 3 and 5). F_HOM gave constant estimates for homozygous genotypes, regardless of the allele frequency (Figs. 3 and 5). When the allele frequency of the non-reference alleles is smaller than 0.2 or larger than 0.8, F_GRM was less than 0 (Figs. 3 and 5). When the allele frequency of the non-reference allele was between 0.2 and 0.5 or when the allele frequency of the reference allele was between 0.5 and 0.8, F_GRM become positive and ranges from 0 to 1 (Figs. 3 and 5).

For a heterozygous locus with an allele frequency ranging from 0 to 1, F_GRM and F_HOM ranged from minus infinity to plus infinity, and F_UNI has a constant value of 0 (Fig. 4). If the allele frequency was smaller than 0.2 or larger than 0.8 F_GRM become very large positive whereas F_HOM become a large negative. F_HOM was always negative, and F_GRM was always positive (Fig. 4). Thus, when a population has a high level of heterozygosity and some rare alleles with small frequency, F_GRM would yield large positive inbreeding coefficients, which can be misleading. This result explains why F_GRM was positive in the RDC breed (Table 2): this population had a higher level of heterozygosity than HOL and JER. F_UNI gave a stable value of 0 when the locus was heterozygous and therefore was robust to allele frequency (Fig. 4).

The correlation between the three estimators F_GRM, F_HOM and F_UNI was computed for each of the three genotypes (i.e., homozygotes for allele 1, homozygotes for allele 2 and heterozygotes) for comparison between F_GRM, F_HOM and F_UNI when the allele frequency was varied between 0 and 1 (Fig. 6). Correlations reached the maximal value (i.e., 1) when the allele frequencies were 0.5. When the allele frequencies were extremely high or low, correlations between estimators became low, especially the correlation between F_GRM and F_HOM (0.27). The correlation plot (Fig. 6) reflected a similar result as those in Figs. 3, 4 and 5. Therefore, when computing inbreeding coefficients using allele frequencies, populations with different allele frequencies might have very different inbreeding coefficients and the correlations between those inbreeding coefficients might be very low, with different allele frequencies.

The comparison between F_GRM and other estimators showed a very low correlation and F_GRM was mostly negatively correlated with other estimators. F_HOM based on excess of homozygosity was positively correlated with other estimators and was relatively highly correlated with F_ROH detected from 50k and sequence data. F_UNI based on correlations between uniting gametes estimated from 50k data generally was negatively correlated with other estimators. However, with increasing marker density, the correlation between F_UNI and other estimators became positive for the HOL and RDC populations. Surprisingly, when using sequence data, F_UNI was highly correlated with other estimators, especially F_ROH, detected from sequence data (~0.95) for HOL. This correlation may have resulted from the nature of the estimators: F_ROH uses only runs of homozygosity, whereas the other estimators (to some extent) capture all of the homozygosity. This high correlation for F_UNI and F_ROH compared with low correlation between F_GRM and F_ROH might also be explained by the algorithms: F_GRM = (1 + F)-1 and F is the correlation between uniting gametes. This estimator has only sampling on the F-term, whereas in the F_GRM estimator there is also sampling variance on the “1”, which creates additional sampling variance.

It is known that RDC is an admixed breed with introgressed haplotypes from Old Danish Red, Holstein and Brown Swiss breeds. HOL and JER are relatively pure breeds and more inbred than RDC (Zhang Q, Guldbrandtsen B, Bosse M, Lund MS, Sahana G. Runs of homozygosity and distribution of functional variants in the cattle genome. BMC Genomics (in press)). Therefore, minor allele frequencies tend to be lower in HOL and JER breeds than in RDC. F_GRM is negatively correlated with other estimators for all three breeds. F_HOM becomes negative for RDC, which is likely due to the admixture present in RDC. Therefore, it appears that F_GRM tends to be less accurate for populations with a low minor allele frequency and that F_HOM tends to be less accurate for populations with a higher level of heterozygosity. This argument is supported by our results that the three inbreeding estimators F_GRM, F_HOM and F_UNI were most closely correlated with each other when the allele frequency is approximately 0.5 (Figs. 3, 4 and 5). Therefore, the three estimators F_GRM, F_HOM and F_UNI depend strongly on the estimation of allele frequencies in the population, unlike F_ROH. However, here we only took one locus as an example to study the impact of allele frequencies on three estimators F_GRM, F_HOM and F_UNI.

Conclusion

In this study, we compared different estimators of inbreeding coefficient with different types of data (pedigree, 50k SNP chip genotypes and full sequence data). Methods based on GRM, excess of homozygosity and the correlation between uniting gametes were observed to be sensitive to allele frequencies in the base population. The estimator based on pedigree data was moderately correlated with estimators based on ROH when a pedigree is relatively complete. Estimators based on ROH from SNP chip genotypes and full sequence directly reflect homozygosity on the genome, and have the advantage of not being affected by estimates of allele frequency or incompleteness of the pedigree. Inbreeding estimated from ROH was shown to be affected by the marker density used. Using sequence data, we obtained a full picture of the distribution of ROH on the genome, including short and medium length ROH that reflect ancient inbreeding regions which are possibly IBD. Detecting ROH based on high-density or 50k chip data was shown to give estimates most closely related to ROH from sequence data. However, more than 50k genotypes are required to accurately detect short ROH that are most likely identical by descent (IBD).

Availability of supporting data

Data used in this study are from the 1000 Bull Genome Project (Daetwyler et al. 2014 Nature Genet. 46:858–865). Whole genome sequence data of individual bulls of the 1000 Bull Genomes Project are already available at NCBI using SRA no. SRP039339 (http://www.ncbi.nlm.nih.gov/bioproject/PRJNA238491).

Abbreviations

F:: Inbreeding coefficient
F_PED :: Pedigree based inbreeding coefficient
ROH:: Run of homozygosity
F_ROH :: Runs of homozygosity-based inbreeding coefficients
GRM:: Genomic relationship matrix
F_GRM :: Genomic relationship matrix-based inbreeding coefficients
F_HOM :: Inbreeding coefficients based on excess of homozygosity
F_UNI :: Inbreeding coefficients based on correlation of uniting gametes
IBD:: Identity by descent
SNP:: Single nucleotide polymorphism
NGS:: Next-generation sequence
HOL:: Holstein
JER:: Jersey
RDC:: Danish Red Cattle
WGS:: Whole-genome sequence

References

Wright S. Coefficients of inbreeding and relationship. Am Nat. 1922;56:330–8.
Article Google Scholar
Gonzalez-Recio O, de Maturana EL, Gutierrez JP. Inbreeding depression on female fertility and calving ease in Spanish dairy cattle. J Dairy Sci. 2007;90(12):5744–52.
Article CAS PubMed Google Scholar
Margolin S, Bartlett JW. The influence of inbreeding upon the weight and size of dairy cattle. J Anim Sci. 1945;4(1):3–12.
Google Scholar
Miglior F, Szkotnicki B, Burnside EB. Analysis of levels of inbreeding and inbreeding depression in Jersey cattle. J Dairy Sci. 1992;75(4):1112–8.
Article CAS PubMed Google Scholar
Nomura T, Honda T, Mukai F. Inbreeding and effective population size of Japanese black cattle. J Anim Sci. 2001;79(2):366–70.
CAS PubMed Google Scholar
Smith LA, Cassell BG, Pearson RE. The effects of inbreeding on the lifetime performance of dairy cattle. J Dairy Sci. 1998;81(10):2729–37.
Article CAS PubMed Google Scholar
Sorensen AC, Sorensen MK, Berg P. Inbreeding in Danish dairy cattle breeds. J Dairy Sci. 2005;88(5):1865–72.
Article CAS PubMed Google Scholar
Szpiech ZA, Xu JS, Pemberton TJ, Peng WP, Zollner S, Rosenberg NA, et al. Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet. 2013;93(1):90–102.
Article CAS PubMed Central PubMed Google Scholar
Bjelland DW, Weigel KA, Vukasinovic N, Nkrumah JD. Evaluation of inbreeding depression in Holstein cattle using whole-genome SNP markers and alternative measures of genomic inbreeding. J Dairy Sci. 2013;96(7):4697–706.
Article CAS PubMed Google Scholar
Leroy G. Inbreeding depression in livestock species: review and meta-analysis. Anim Genet. 2014;45(5):618–28.
Article CAS PubMed Google Scholar
Charlesworth D, Charlesworth B. Inbreeding depression and its evolutionary consequences. Annu Rev Ecol Syst. 1987;18:237–68.
Article Google Scholar
Wright S. Systems of mating. II. The effects of inbreeding on the genetic composition of a population. Genetics. 1921;6(2):124–43.
CAS PubMed Central PubMed Google Scholar
Pusey A, Wolf M. Inbreeding avoidance in animals. Trends Ecol Evol. 1996;11(5):201–6.
Article CAS PubMed Google Scholar
Weigel K. Controlling inbreeding in modern dairy breeding programs. Adv Dairy Technol. 2006;18:263–74.
Google Scholar
Mc Parland S, Kearney JF, Rath M, Berry DP. Inbreeding trends and pedigree analysis of Irish dairy and beef cattle populations. J Anim Sci. 2007;85(2):322–31.
Article CAS PubMed Google Scholar
Blackwell BF, Doerr PD, Reed JM, Walter JR. Inbreeding rate and effective population-size - a comparison of estimates from pedigree analysis and a demographic-model (Vol 71, Pg 299, 1995). Biol Conserv. 1995;72(3):407.
Google Scholar
Meuwissen THE, Luo Z. Computing inbreeding coefficients in large populations. Genet Sel Evol. 1992;24(4):305–13.
Article PubMed Central Google Scholar
Cassell BG, Adamec V, Pearson RE. Effect of incomplete pedigrees on estimates of inbreeding and inbreeding depression for days to first service and summit milk yield in Holsteins and Jerseys. J Dairy Sci. 2003;86(9):2967–76.
Article CAS PubMed Google Scholar
Vanraden PM. Accounting for inbreeding and crossbreeding in genetic evaluation of large populations. J Dairy Sci. 1992;75(11):3136–44.
Article Google Scholar
McQuillan R, Leutenegger AL, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83(3):359–72.
Article CAS PubMed Central PubMed Google Scholar
Broman KW, Weber JL. Long homozygous chromosomal segments in reference families from the Centre d’Etude du polymorphisme humain. Am J Hum Genet. 1999;65(6):1493–500.
Article CAS PubMed Central PubMed Google Scholar
Marras G, Gaspa G, Sorbolini S, Dimauro C, Ajmone-Marsan P, Valentini A, et al: Analysis of runs of homozygosity and their relationship with inbreeding in five cattle breeds farmed in Italy. Anim Genet. 2015;46(2):110-121.
Bosse M, Megens HJ, Madsen O, Paudel Y, Frantz LA, Schook LB, et al. Regions of homozygosity in the porcine genome: consequence of demography and the recombination landscape. PLoS Genet. 2012;8(11):e1003100.
Article CAS PubMed Central PubMed Google Scholar
Ferencakovic M, Solkner J, Curik I. Estimating autozygosity from high-throughput information: effects of SNP density and genotyping errors. Genet Sel Evol. 2013;45:42.
Article PubMed Central PubMed Google Scholar
Keller MC, Visscher PM, Goddard ME. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data (vol 189, pg 237, 2011). Genetics. 2012;190(1):283.
Google Scholar
Kirin M, McQuillan R, Franklin CS, Campbell H, McKeigue PM, Wilson JF. Genomic runs of homozygosity record population history and consanguinity. Plos One. 2010;5(11):e13996.
Article PubMed Central PubMed Google Scholar
Ku CS, Naidoo N, Teo SM, Pawitan Y. Regions of homozygosity and their impact on complex diseases and traits. Hum Genet. 2011;129(1):1–15.
Article PubMed Google Scholar
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319(5866):1100–4.
Article CAS PubMed Google Scholar
Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993;134(4):1289–303.
CAS PubMed Central PubMed Google Scholar
Ferencakovic M, Hamzic E, Gredler B, Solberg TR, Klemetsdal G, Curik I, et al. Estimates of autozygosity derived from runs of homozygosity: empirical evidence from selected cattle populations. J Anim Breed Genet. 2013;130(4):286–93.
Article CAS PubMed Google Scholar
Purfield DC, Berry DP, McParland S, Bradley DG. Runs of homozygosity and population history in cattle. BMC Genet. 2012;13:70.
Article CAS PubMed Central PubMed Google Scholar
MacLeod IM, Larkin DM, Lewin HA, Hayes BJ, Goddard ME. Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors. Mol Biol Evol. 2013;30(9):2209–23.
Article CAS PubMed Central PubMed Google Scholar
Wright S. Genetics of populations. Encyclopaedia Britannica. 1948;10:111-A-D-112.
Google Scholar
Andersen B, Jensen B, Nielsen A, Christensen LG, Liboriussen T. Rød Dansk Malkerace-avlsmæssigt of kulturhistorisk belyst. Denmark: Danmarks HordbrugsForskning; 2003.
Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8(3):175–85.
Article CAS PubMed Google Scholar
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.
Article CAS PubMed Central PubMed Google Scholar
Hoglund JK, Sahana G, Guldbrandtsen B, Lund MS. Validation of associations for female fertility traits in Nordic Holstein, Nordic Red and Jersey dairy cattle. BMC Genet. 2014;15:8.
Article PubMed Central PubMed Google Scholar
Quaas RL. Computing the diagonal elements and inverse of a large numerator relationship matrix. Biometrics. 1976;32:949–953.
Strandén I, Vuori K: Relax2: pedigree analyses program. Proceedings of the 8th WCGALP. Belo Horizonte, MG, Brazil: Instituto Prociência; 2006.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.
Article CAS PubMed Google Scholar
Yang JA, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82.
Article CAS PubMed Central PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
Article CAS PubMed Central PubMed Google Scholar
Pryce JE, Haile-Mariam M, Goddard ME, Hayes BJ. Identification of genomic regions associated with inbreeding depression in Holstein and Jersey dairy cattle. Genet Sel Evol. 2014;46(1):71.
Article PubMed Central PubMed Google Scholar
Arias JA, Keehan M, Fisher P, Coppieters W, Spelman R. A high density linkage map of the bovine genome. BMC Genet. 2009;10:18.
Article PubMed Central PubMed Google Scholar
Ramey HR, Decker JE, McKay SD, Rolf MM, Schnabel RD, Taylor JF. Detection of selective sweeps in cattle using genome-wide SNP data. BMC Genomics. 2013;14:382.
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgement

Q. Zhang benefited from a joint grant from the European Commission within the framework of the Erasmus-Mundus joint doctorate “EGS-ABG”. This research was supported by the Center for Genomic Selection in Animals and Plants (GenSAP) funded by The Danish Council for Strategic Research.

Author information

Authors and Affiliations

Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, DK-8830, Denmark
Qianqian Zhang, Bernt Guldbrandtsen, Mogens S Lund & Goutam Sahana
Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, 6700 AH, The Netherlands
Qianqian Zhang & Mario PL Calus

Authors

Qianqian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mario PL Calus
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Guldbrandtsen
View author publications
You can also search for this author in PubMed Google Scholar
Mogens S Lund
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Sahana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Qianqian Zhang or Goutam Sahana.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

QZ developed and planned the study design, coordinated the study, recruited the participants, performed data analyses and drafted the manuscript. MC participated in the study design, analyses of data, and drafting the manuscript. BG participated in the study design, analyses of data, and drafting the manuscript. MSL participated in study design and drafting the manuscript. GS participated in the study design, analyses of data, and drafting the manuscript. All authors read and approved the final manuscript.

Rights and permissions

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Zhang, Q., Calus, M.P., Guldbrandtsen, B. et al. Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds. BMC Genet 16, 88 (2015). https://doi.org/10.1186/s12863-015-0227-7

Download citation

Received: 08 February 2015
Accepted: 10 June 2015
Published: 22 July 2015
DOI: https://doi.org/10.1186/s12863-015-0227-7

Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds