Skip to main content

Linkage analysis of GAW14 simulated data: comparison of multimarker, multipoint, and conditional approaches

Abstract

The purposes of this study were 1) to examine the performance of a new multimarker regression approach for model-free linkage analysis in comparison to a conventional multipoint approach, and 2) to determine the whether a conditioning strategy would improve the performance of the conventional multipoint method when applied to data from two interacting loci. Linkage analysis of the Kofendrerd Personality Disorder phenotype to chromosomes 1 and 3 was performed in three populations for all 100 replicates of the Genetic Analysis Workshop 14 simulated data. Three approaches were used: a conventional multipoint analysis using the Zlr statistic as calculated in the program ALLEGRO; a conditioning approach in which the per-family contribution on one chromosome was weighted according to evidence for linkage on the other chromosome; and a novel multimarker regression approach. The multipoint and multimarker approaches were generally successful in localizing known susceptibility loci on chromosomes 1 and 3, and were found to give broadly similar results. No advantage was found with the per-family conditioning approach. The effect on power and type I error of different choices of weighting scheme (to account for different numbers of affected siblings) in the multimarker approach was examined.

Methods

Linkage analysis of the Kofendrerd Personality Disorder (KPD) phenotype to chromosomes 1 and 3 was performed in the Danacaa, Karangar, and Aipoto populations, with knowledge of the "answers". An important aim of our investigation was to compare the results from an affected sib-pair (ASP) multimarker approach with those from a conventional multipoint approach, and these populations were chosen because of their ascertainment via nuclear families rather than via multi-generational pedigrees. Multipoint linkage analysis was performed using the allele-sharing Zlr statistic [1] as calculated in the program ALLEGRO [2] under an exponential model. Since it was known from the "answers" that the disease loci on chromosomes 1 and 3 interact in an epistatic manner, we also performed a weighted conditional analysis in which the per-family contribution to the Zlr on one chromosome was weighted according to evidence for linkage on the other chromosome, as previously suggested [3].

The results from the multipoint approach were compared with those from a multimarker regression approach that models the observed identity-by-descent (IBD) states for ASPs at a series of genetic markers in terms of the IBD state at a presumed disease locus in the region. The expected IBD state at the disease locus, and hence the expected IBD state at the marker loci, are considered parameters to be estimated in the regression procedure. For a given marker and parent type (mother or father), the expected IBD state can be written as p M = x 1 + p D x 2 , where p M and p D correspond to the probability of sharing an allele IBD at the marker and disease locus, respectively, and the x variables correspond to conditional probabilities of marker IBD state given disease locus IBD state: x 1 = P(M|d) and x 2 = P(M|D) - P(M|d). Here M and m denote the events that the observed marker IBD state is 1 and 0, and D and d the events that the disease IBD state is 1 and 0, respectively. These may be written P(M|D) = θ2+(1 - θ)2 and P(M|d) = 1 - P(m|d) = 1 - P(M|D). Thus, the expected IBD states at each of the markers are modelled in terms of p D , the expected IBD state at the disease locus (which will be estimated as a regression coefficient), and x variables that are functions of the recombination fractions θ between the markers and disease locus. The IBD states for mothers and fathers are modelled separately (assuming independence), which allows the possibility of using different values of θ for the two types of parent, i.e., incorporating sex-specific recombination fractions if desired.

The model specifying the expected IBD states is fitted to the observed marker IBD states via a generalized estimating equation (GEE) approach. Because the IBD state is considered for each parent separately, the observed IBD events are Bernoulli random variables with known functional relationship between the mean and variance, and correlation between IBD states (at different markers for a given parent type) that depends on p D and the known recombination fractions between the markers. The data may be analyzed via standard GEE software that allows specification of the correlation structure (specified under the null hypothesis that p D = 0.5). At a given putative disease locus location, this procedure provides an estimate of p D together with its estimated standard error SE () that may be used to produce a z-score ( - 0.5)/SE () that is normally distributed under the null hypothesis that p D = 0.5. The whole procedure is repeated with the disease locus allowed to take a variety of putative positions along the marker map, and the position where the z-score is most significant is taken as the estimate of the disease locus location. An example of the fitted regression line, using the disease locus location that gives maximal evidence against the null hypothesis, is shown in Figure 1 for chromosomes 1 and 3 of the Danacaa data, replicate 100.

Figure 1
figure 1

Example of multimarker approach for Danacaa population, replicate 100. Results are shown for the fitted regression line that maximizes the test of p D = 0.5 across each chromosome.

The multimarker approach is both conceptually and analytically very similar to a previously proposed GEE approach [4]. The multimarker approach differs from the previously proposed method mainly with regard to the test statistic, which is calculated at a variety of increments (putative positions of the disease locus) across the region, in an approach akin to standard multipoint analysis. The multimarker approach also differs from the previously proposed approach by considering the contribution of each parent separately, which could potentially allow the use of different marker maps in males and females (although sex-specific maps were not provided for these data). From Figure 1, it is clear that the greatest contribution to the test statistic at a given disease locus location will come from the observed IBD states at the two flanking markers. The speed of the multimarker procedure can therefore be considerably increased by using data only from the two flanking markers, in an interval mapping type approach, when testing a putative disease location. For each parent, we used data from the two flanking markers (when informative) or the closest informative flanking markers otherwise. In practice, this appeared to make very little difference to the multimarker results (data not shown) and so results presented here will all assume the flanking marker approximation.

An issue not investigated in the previously proposed approach [4] was the choice of different possible weighting schemes for ASPs derived from sibships with more than two affected individuals. Several different weighting schemes have been proposed to adjust for the non-independence of such affected pairs, but the optimal scheme will depend both on the analysis method used and on whether the goal is merely to maintain type I error or also increase power [5]. With regard to power, the optimal weighting scheme may depend on the unknown underlying genetic model [5]. We investigated the performance of four different weighting schemes for the multimarker approach. The schemes investigated were 1) the Hodge scheme [6], in which the contribution of each ASP from a sibship with a affected individuals is scaled by a factor of (4/3)(2a-3+0.5a-1)/[a(a-1)]; 2) the Suarez and Hodge scheme [7], in which each ASP's contribution is scaled by a factor of 2/a; 3) the scheme used by Liang et al. [4], in which each family contributes equally, achieved by scaling the contribution of each pair by a factor of 2/[a(a-1)]; 4) a scheme where each pair contributes equally regardless of the number of affected sibs in the family. These weighting schemes were incorporated into the multimarker analysis via use of importance weights in the statistical analysis package STATA.

Results

Figure 2 shows the results from the multipoint and multimarker (with Hodge weights) approaches applied to a single replicate, replicate 100. Results are very similar for both methods. The Danacaa study appears to provide good evidence for the disease locus on chromosome 1, but the results on chromosome 3 are less convincing. The Karangar and Aipotu studies show little evidence of linkage on chromosome 1 but provide good evidence of linkage for the disease locus on chromosome 3. The results for the multipoint analysis of all 100 replicates are shown in Table 1. The average maximum Zlr on each chromosome is slightly higher than the average Zlr at the true disease locus location, as expected, owing to the upward bias incurred by choosing the maximum on a chromosome. The Danacaa study generally provides good evidence for the disease loci on chromosomes 1 (mean Zlr = 4.52, p = 3 ×10-6) and 3 (mean Zlr = 3.92, p = 4 ×10-5). The Karangar study provides reasonable evidence for the disease locus on chromosome 3 (mean Zlr = 2.80, p = 0.002) but little evidence on chromosome 1 (mean Zlr = 1.32, p = 0.09), while the Aipotu study provides good evidence for the disease locus on chromosome 3 (mean Zlr = 3.20, p = 0.0007) and some evidence for the disease locus on chromosome 1 (mean Zlr = 2.08, p = 0.02). The Zlr scores from the conditional weighted analyses are lower than those from the unweighted analysis, indicating no advantage from using conditioning weights.

Figure 2
figure 2

Comparison of multipoint results (shown with solid lines) and multimarker results (shown with dashed lines) for replicate 100.

Table 1 Average Zlr z-score (over 100 replicates) using multipoint and weighted conditional analysis

The z-score results from the multimarker approach are given in Table 2, and are found to be broadly comparable with the multipoint results, particularly when using the Hodge or Suarez and Hodge weighting schemes. Type I error is acceptable for all four weighting schemes, as shown in Table 2 by the analysis of chromosome 4, on which no disease locus exists. The mean z-score on chromosome 4 is close to 0 with variance close to 1 and approximate normality (and therefore correct type I error, data not shown) for all four weighting schemes. The positions of the maximum Zlr from the multipoint approach and the maximum z-score from the multimarker approach are shown in Figure 3. Localization of the disease loci (at true positions approximately 173 cM on chromosome 1 and 314 cM on chromosome 3) is generally good for both methods, although there is some suggestion that the localization on chromosome 1 in the Danacaa population is slightly more precise under the multipoint approach.

Table 2 Average z-score (over 100 replicates) using multimarker analysis with various sibship weighting schemes
Figure 3
figure 3

Histograms showing location of maximum over 100 replicates for multipoint and multimarker methods.

Discussion

Overall, the multimarker and multipoint approaches appear to provide quite similar results, particularly when using the Hodge or Suarez and Hodge weighting schemes. Slightly greater power for the multimarker approach is obtained using the 'Equal pairs' weighting scheme, which is consistent with the results of Sham et al. [5]. The generally stronger results from the Danacaa study in comparison to the Karangar and Aipotu studies are perhaps not surprising, given that the ascertainment of the Danacaa families is via phenotype 1, which is influenced solely by the disease loci on chromosomes 1 and 3.

The Zlr scores from the conditional weighted analyses are lower than those from the unweighted analysis, indicating no improvement in power from using conditioning weights, and no power to detect an interaction. The exact form of the proposed interaction is not specified in the "answers" and could potentially correspond to a number of different underlying scenarios [8]. Only those scenarios that result in departure from a multiplicative penetrance model might in fact be expected to be detectable using the approach described here.

Conclusion

The multipoint and multimarker approaches were generally successful in localizing known susceptibility loci on chromosomes 1 and 3, and were found to give broadly similar results. No advantage was found with a per-family conditioning approach. For the multimarker approach, greatest power and acceptable type I error was seen with the 'Equal pairs' weighting scheme.

Abbreviations

ASP:

Affected sib pair

GEE:

Generalized estimating equation

KPD:

Kofendrerd Personality Disorder

IBD:

Identity by descent

References

  1. Kong A, Cox NJ: Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet. 1997, 61: 1179-1188. 10.1086/301592.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet. 2000, 25: 12-13. 10.1038/75514.

    Article  CAS  PubMed  Google Scholar 

  3. Cox NJ, Frigge M, Nicolae DL, Concannon P, Hanis CL, Bell GI, Kong A: Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nat Genet. 1999, 21: 213-215. 10.1038/6002.

    Article  CAS  PubMed  Google Scholar 

  4. Liang KY, Chui YF, Beaty TH: A robust identity by descent procedure using affected sib pairs: a multipoint approach for complex diseases. Hum Hered. 2001, 51: 64-78. 10.1159/000022961.

    Article  CAS  PubMed  Google Scholar 

  5. Sham PC, Zhao JH, Curtis D: Optimal weighting scheme for affected sib-pair analysis of sibship data. Ann Hum Genet. 1997, 61: 61-69. 10.1017/S0003480096005933.

    Article  CAS  PubMed  Google Scholar 

  6. Hodge SE: The information contained in multiple sibling pairs. Genet Epidemiol. 1984, 1: 109-122. 10.1002/gepi.1370010203.

    Article  CAS  PubMed  Google Scholar 

  7. Suarez BK, Hodge SE: A simple method to detect linkage for rare recessive diseases: an application to juvenile diabetes. Clin Genet. 1979, 15: 126-136.

    Article  CAS  PubMed  Google Scholar 

  8. Cordell HJ: Epistasis: what it means, what it doesn't mean, statistical methods to detect it in humans. Hum Mol Genet. 2000, 11: 2463-2468. 10.1093/hmg/11.20.2463.

    Article  Google Scholar 

Download references

Acknowledgements

Funding for this work was provided by the Medical Research Council (UK), the Juvenile Diabetes Foundation and the Wellcome Trust.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heather J Cordell.

Additional information

Authors' contributions

MJB developed and applied the multimarker regression approach. EW applied the per-family weighted multipoint approach and generated the figures. HJC directed the project and drafted the final manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Barber, M.J., Wheeler, E. & Cordell, H.J. Linkage analysis of GAW14 simulated data: comparison of multimarker, multipoint, and conditional approaches. BMC Genet 6 (Suppl 1), S40 (2005). https://doi.org/10.1186/1471-2156-6-S1-S40

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2156-6-S1-S40

Keywords