Skip to main content

Examining the effect of linkage disequilibrium on multipoint linkage analysis

Abstract

Most linkage programs assume linkage equilibrium among multiple linked markers. This assumption may lead to bias for tightly linked markers where strong linkage disequilibrium (LD) exists. We used simulated data from Genetic Analysis Workshop 14 to examine the possible effect of LD on multipoint linkage analysis. Single-nucleotide polymorphism packets from a non-disease-related region that was generated with LD were used for both model-free and parametric linkage analyses. Results showed that high LD among markers can induce false-positive evidence of linkage for affected sib-pair analysis when parental data are missing. Bias can be eliminated with parental data and can be reduced when additional markers not in LD are included in the analyses.

Background

Most multipoint linkage programs assume linkage equilibrium among the markers being studied. This assumption is appropriate for the study of sparsely spaced markers with inter-marker distances exceeding a few centimorgans, because linkage equilibrium is expected over these intervals for almost all populations. However, with recent advances in high-throughput genotyping technology, much denser markers are available and linkage disequilibrium (LD) may exist among the markers. Applying linkage analyses that assume linkage equilibrium to dense markers may lead to bias. It is well known that misspecification of allele frequencies can cause inflation of LOD scores for both model-free [1] and model based [2, 3] linkage approaches. However, estimating allele frequencies from the available data will generally correct this problem [4]. Rare exceptions such as unrecognized inbreeding at a high level or the presence of pronounced stratification might cause an excess of false-positive rates for linkage tests when only affected sib-pairs lacking parents are analyzed [5]. In the case of tightly linked loci, assuming linkage equilibrium for tightly linked markers causes incorrect inference of haplotype frequencies, which can lead to a bias similar to that induced by misspecification of allele frequencies for multi-allelic markers. However, accurately estimating haplotype frequencies is more difficult than estimating allele frequencies because of phase uncertainty. Many currently available programs such as ALLEGRO and GENEHUNTER do not allow the user to specify haplotype frequencies, while programs that will allow the user to specify haplotypes, including LINKAGE and LIPPED are very unwieldy to use in this case.

Recently, Huang et al. [6] demonstrated that assuming linkage equilibrium between tight linked markers where strong LD exists may cause apparent over-sharing of multipoint IBD among affected sibs and thus result in false-positive evidence for linkage. Here in this workshop, Genetic Analysis Workshop 14 (GAW14), we used the simulated data to further explore the effect that LD exerts in causing an excess of false-positive results. The workshop data afforded a more realistic situation upon which to study effects of LD than was covered by Huang et al. [6], because the data were simulated to represent a complex disease model and a large set of markers were available for further examination of the possible effects that LD can have upon multipoint linkage analysis.

Methods

In order to examine the possible effect of LD on linkage analysis, we decided to study the markers from a dense marker dataset, because the inter-marker distances are smaller and the simulated LD was higher. Single-nucleotide polymorphism (SNP) packets from the non-disease related regions that were generated with LD were bought and used for the analyses. The inter-marker distance was 0.29 cM on average among these markers (20 SNPs per packet). Pedigree samples from the Aipotu population of simulated GAW14 data were used for the analyses. There were 100 nuclear families in the replicate sample and at least two sibs were affected with Kofendrerd Personality Disorder (KPD) in each family. We treated parents from each family as unrelated individuals and used them to estimate haplotype frequencies and LD. Haplotype frequencies were estimated by using the expectation maximization algorithm [7] and pair-wise LD was calculated by using standard formula [8] that are implemented in the EMLD program. We randomly selected a single sib pair from each family to ensure independence of the sib pairs. We then studied each family either including or excluding all parental genotype data. Multipoint and single-point linkage analyses of the affected sib-pair data were carried out using ALLEGRO [9]. For model-free multipoint linkage analyses, we used a Kong and Cox exponential model [10] and the score function of Spairs [11]. For the parametric linkage analyses, we assumed a simple dominant disease model with 100% penetrance in carriers and 0% penetrance in non-carriers, and we incorporated a heterogeneity parameter [12], thus allowing some but not all families to be linked.

Results

Although all the SNP packets that we examined were from regions that were generated with LD, LD was not strong in most of the regions and did not have an obvious effect on linkage analysis. However, strong LD existed between three markers in SNP packet 121: B03T2407, B03T2408, and C03R0221 with pair-wise D' > 0.95 and r2 > 0.38. The pair-wise LD as measured by D' and r2 for this packet is shown in Table 1.

Table 1 Pair-wise LD between 20 SNPs of SNP packet 121 in sample replicate 1 from Aipotu population (D' measure above the diagonal and r2 below the diagonal).

Single-point linkage analysis did not show any evidence of linkage both for the three markers in strong LD alone and for the whole marker set (Fig. 1). However, using the three markers that are in strong LD and affected sib-pair only data, multipoint linkage analysis showed false-positive evidence of linkage for both model-free and parametric linkage analyses that incorporated a heterogeneity parameter (Fig. 1). This confirmed the observation by Huang et al. [6]. Including parents in the multipoint analysis eliminated the false-positive evidence (data not shown). The false-positive evidence induced by LD can be gradually reduced by adding markers that are not in LD to either or both sides of the three core markers that are in strong LD, and it seemed a better "rescue" effect can be achieved by adding markers to both sides than to a single side (Table 2). With all 20 markers, there is no evidence of linkage (maximal LOD score at the peak position: 0.34 ± 0.2).

Figure 1
figure 1

Linkage analysis results for the 20 SNPs and the three SNPs with strong LD. The left panel indicates results using a nonparametric NPL approach, while the right panel indicates results from a parametric linkage analysis allowing for locus heterogeneity.

Table 2 Multipoint LOD scores for different set of markers from model-free linkage analysis.

Conclusion

For multipoint linkage analysis of affected sib-pair data, for which parental phase information is inferred from the sib pairs, usual methods of linkage analysis assume linkage equilibrium between multiple linked markers and assigns equal probabilities to all possible phases. This assumption can cause overestimation of multipoint identity by decent (IBD) sharing and induces false positives for both model-free and parametric linkage analysis, as showed by Huang et al. [6]. This study further confirmed this observation by studying independently generated data that were simulated to reflect conditions that might be found in a genome scan. Among the markers that we studied, false-positive evidence for linkage was only obtained for a small subset of markers that showed high LD. We also showed here that including markers that are not in LD can reduce the false-positive evidence of linkage induced by markers in high LD. This indicated that including markers that are not in strong LD ensures that the haplotype frequencies are closer to those expected under the linkage equilibrium assumption and thus may help to reduce false-positive linkage findings. We also found that the LD effect is severe only when the majority of the markers being jointly examined are in strong LD. Single-point linkage analysis is not affected by LD. Therefore, given the relatively accurate allele frequencies that can readily be obtained for single marker, single-point linkage analysis can be used as a check for any suspicious false positives by comparing results to multipoint analysis. However, when a very large number of SNPs are studied, a possibility remains that allele frequency estimates for individual SNPs might be biased perhaps either by unrecognized strong stratification in the sample or by nonrandom errors introduced during processing. A potential further check is the confirmation of linkage at multiple SNPs in a region, as well as absence of linkage signal for most of the remainder of the genome. With current advances in high-throughput genotyping technology, high density marker data are easily generated. Caution must be taken when applying traditional linkage analysis to dense markers where strong LD may exist.

Our results indicate that LD among tightly linked marker should be examined, especially in the fine-mapping stage where strong LD is likely to exist between the markers. Markers that are in strong LD should not be used together for linkage analysis in order to avoid possible false positives. An alternative approach is to modify current linkage programs to allow for LD so that all marker information can be used in the search for a disease-related region.

Abbreviations

GAW14:

Genetic Analysis Workshop 14

IBD:

Identity by descent

KPD:

Kofendrerd Personality Disorder

LD:

Linkage disequilibrium

SNP:

Single-nucleotide polymorphism

References

  1. Eichenbaum-Voline S, Genin E, Babron MC, Margaritte-Jeannin P, Prum B, Clerget-Darpoux F: Caution in the interpretation of MLS. Genet Epidemiol. 1997, 14: 1079-1083. 10.1002/(SICI)1098-2272(1997)14:6<1079::AID-GEPI86>3.0.CO;2-E.

    Article  CAS  PubMed  Google Scholar 

  2. Ott J: Strategies for characterizing highly polymorphic markers in human gene mapping. Am J Hum Genet. 1992, 51: 283-290.

    PubMed Central  CAS  PubMed  Google Scholar 

  3. Freimer NB, Sandkuijl LA, Blower SM: Incorrect specification of marker allele frequencies: effects on linkage analysis. Am J Hum Genet. 1993, 52: 1102-1110.

    PubMed Central  CAS  PubMed  Google Scholar 

  4. Williamson JA, Amos CI: Guess LOD approach: sufficient conditions for robustness. Genet Epidemiol. 1995, 12: 163-176. 10.1002/gepi.1370120205.

    Article  CAS  PubMed  Google Scholar 

  5. Liu W, Weir BS: Affected sib pair tests in inbred populations. Ann Hum Genet. 2004, 68: 606-619. 10.1046/j.1529-8817.2004.00121.x.

    Article  CAS  PubMed  Google Scholar 

  6. Huang Q, Shete S, Amos CI: Ignoring linkage disequilibrium between markers induces false positive evidence of linkage for affected sib-pair analysis. Am J Hum Genet. 2004, 75: 1106-1112. 10.1086/426000.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Excoffer L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol. 1995, 12: 921-927.

    Google Scholar 

  8. Devlin B, Risch N: A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics. 1995, 29: 311-322. 10.1006/geno.1995.9003.

    Article  CAS  PubMed  Google Scholar 

  9. Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet. 2000, 25: 12-13. 10.1038/75514.

    Article  CAS  PubMed  Google Scholar 

  10. Kong A, Cox NJ: Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet. 1997, 61: 1179-1188. 10.1086/301592.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Whittemore AS, Halpern J: A class of tests for linkage using affected pedigree members. Biometrics. 1994, 50: 118-127. 10.2307/2533202.

    Article  CAS  PubMed  Google Scholar 

  12. Ott L: Analysis of Human Linkage. 1999, Baltimore : Johns Hopkins University Press, Third

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher I Amos.

Additional information

Authors' contributions

QH did the analysis and prepared the manuscript. MS assisted in the development of data for this project and performed analysis of the LD patterns of simulated data, and also presented the results at the Genetic Analysis Workshop. SS provided guidance in concept development. CIA directed the project and revised the manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Huang, Q., Shete, S., Swartz, M. et al. Examining the effect of linkage disequilibrium on multipoint linkage analysis. BMC Genet 6 (Suppl 1), S83 (2005). https://doi.org/10.1186/1471-2156-6-S1-S83

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2156-6-S1-S83

Keywords