 Proceedings
 Open access
 Published:
Covariate linkage analysis of GAW14 simulated data incorporating subclinical phenotype, sex, population, parentoforigin, and interaction
BMC Genetics volumeÂ 6, ArticleÂ number:Â S45 (2005)
Abstract
Background
We evaluate a method for the incorporation of covariates into linkage analysis using the Genetic Analysis Workshop 14 simulated data. Focusing on a randomly chosen replicate (42) we investigated the effect of the 12 subclinical phenotypes, sex, population, and parentoforigin on the linkage signal from a modelfree linkage analysis of Kofendrerd Personality Disorder.
Results
We detected a linkage peak on chromosome 1, at about 175 cM, which varied depending upon individuals' status for subclinical phenotype b. A linkage peak on chromosome 3 (310 cM) was found not to depend upon subclinical phenotype status. Further peaks were found on chromosomes 5 (12 cM), 9 (4 cM), and 10 (95 cM), depending on the status of subclinical phenotypes a, k, and c/d/g, respectively.
Conclusion
Retrospective comparison of our results with the simulation model showed correct identification of disease loci D15 on chromosomes 1, 3, 5, 9 and 10, respectively.
Background
We chose to analyze all four populations of replicate 42 from the simulated data set. All analyses were performed without knowledge of the simulation model. The aim of the analysis was to utilize the information on the subclinical phenotypes of Kofendrerd Personality Disorder (KPD), sex, population, and parentoforigin in a linkage analysis. Including covariates in the analysis allowed us to investigate models, such as locus heterogeneity, that give rise to different subclinical phenotypes within KPD. We present the results of our analyses and a retrospective comparison with the simulation model.
Methods
We began by screening the genome for linkage to KPD. We performed separate scans of the microsatellites and the singlenucleotide polymorphism (SNP) data using the Zlr test statistic from ALLEGRO [1], with the "pairs" option and exponential model. Pedigrees with more than 17 individuals were trimmed to permit analysis with the software. We then examined the effect of the covariates on the linkage peaks. To do this we fitted subclinical phenotype, sex, population, and parentoforigin status as covariates in a modelfree linkage analysis of the microsatellite marker data. We also looked for interactions between linkage peaks using this approach.
Linkage analysis using covariates
Likelihood construction
The multipoint likelihood of the marker data of an affected relative pair at any point in the genome is given by
where z_{ j }is the (unknown) probability that an affected relative pair share j alleles identically by descent (IBD), and f_{ ij }, are the prior and posterior (conditional on the observed marker data) probabilities that pair i shares j alleles IBD [2, 3]. These were obtained for each pair at 1cM intervals with and without parental specific allele sharing estimates using MERLIN [4] and ALLEGRO [1], respectively. Let p_{ FS }be the probability that a pair of affected full siblings share a given parental allele IBD. Following the suggestion of Rice [5, 6], in the absence of a parentoforigin effect the probabilities of sharing paternal and maternal alleles IBD were assumed to be equal and independent. Then z_{0} = (1  P_{ FS })^{2}, z_{1} = 2 p_{ FS }(1  P _{ FS }), and . Similar formulae apply for doublefirstcousin pairs.
Other types of relative pair, R, can only share 0 or 1 allele IBD. For these, z_{0} = 1  P_{ R }, z_{1} = P_{ R }, and z_{2} = 0 (where P_{ R }is the IBD probability for affected relative pairs of type R).
Inclusion of categorical covariates
The effect of a binary covariate on the IBD sharing probabilities may be investigated by modelling P_{ R }in a logistic regression framework including a 3level factor Î² with levels corresponding to the status of the pair with respect to the covariate (/, /+ or +/+, where  denotes absence and + presence of the covariate in an individual). That is,
where O_{ R }is a fixed offset, ensuring that P_{ R }takes the correct value for a relative pair of type R in the absence of linkage (i.e., all coefficients in the regression = 0). Under the null hypothesis of no covariate effect, Î± is a measure of the divergence of IBD from the null in the sample as a whole. The subscript k indexes the status of the particular relative pair with respect to the covariate. Multiple pairs from the same pedigree were analysed as if they were independent, with parameters Î± and Î² in common. To ensure identifiability of the parameters, Î²_{/} was set to zero (making Î± a measure of IBD divergence from the null in / pairs). The degree of IBD sharing for the discordant (/+) pairs was constrained to be less than or equal to the maximum IBD in the concordant pairs, to ensure that the model makes sense biologically. Each of the subclinical phenotypes (a  l) was modelled in this way, as was sex (male denoted by , female denoted by +). Population membership was modelled as a fourlevel factor, with one level for each population (the first was set to zero). The total number of affected relative pairs in each category is shown in Table 1. One might expect a gene that modified the expression of a binary covariate (e.g., subclinical phenotype outcome) in individuals affected with KPD (but not KPD risk itself), to present increased sharing in / or +/+ pairs (or both), with /+ pairs showing reduced sharing. A gene that acts to cause KPD with a particular set of covariate values ( or +) would cause increased sharing in either / or +/+ pairs, with the effects on IBD in the pairs of other types being unclear (dependent on penetrances, gene frequencies, etc.). Caution should be applied to the interpretation of the allele sharing estimates as the differences could arise from a number of reasons.
Inclusion of quantitative covariates
Locus Ã— locus interactions between the peaks were investigated by including the estimated IBD sharing value for each pair at one location on a different chromosome (having subtracted the expected value in the absence of linkage) as a quantitative covariate in the logistic regression for IBD at the peak of interest [7]. This is then repeated for a number of locations in the region surrounding the locus being conditioned on to allow for the fact that linkage peaks are often some distance from disease loci [8]. The test statistic was taken to be the increase in maximum LOD score over the whole region investigated (covering both linkage peaks). For completeness, the hypothesis of an interaction between two peaks was investigated with two tests, i) peak 1 conditional on peak 2, and ii) peak 2 conditional on peak 1. In general, these give similar results.
Inclusion of parentoforigin covariate
Finally, parentoforigin effect was modelled in affected sibling pairs only by splitting the prior and posterior probabilities, f_{ i1 }and , of sharing 1 allele IBD into components reflecting whether the paternal or maternal allele was shared. The IBD probabilities for affected pairs were expressed in terms of IBD probabilities for the paternal (p_{ pat }) and maternal (p_{ mat }) alleles (e.g., z_{2} = p_{ pat }p_{ mat }), with the test statistic for parentoforigin effect given by a likelihoodratio test of p_{ pat }= p_{ mat }.
Test statistic and significance levels
To test for effects of categorical or quantitative covariates, the likelihood was maximized with respect to Î± alone at each position x, to give , and to both Î± and Î², giving . The ratio of the maximum likelihoods on the chromosome, with and without the covariate of interest, gives a LOD score, which was used as the test statistic
We allowed the location of the maximum likelihood to change when the covariate was added. This reflects the fact that linkage peaks from standard analyses are often some distance from the true disease locus [8]. Incorporating the covariate may thus give a more accurate estimate of the disease locus location. Other test statistics are possible, for example the maximum pointwise likelihood ratio. However, the relative performance of these test statistics is unclear at present. Chromosomewide significance levels were obtained by keeping the genotypes fixed and randomly permuting individual covariate values among the affected individuals. Pairwise covariate values were then calculated and the analysis repeated, thus significance levels reflect the dependency of pairs within a pedigree. To test for a parentoforigin effect, the designations of paternal and maternal alleles were randomly swapped for all affected siblings in a sibship. If n replicates are generated in this manner, of which r give a test statistic greater than that in the actual data, the chromosomewide pvalue is estimated by (r + 0.5)/(n + 0.5).
For the test statistic chosen for this analysis, it was not possible to obtain a genomewide significance level for covariate effects because this depends not only on the increase in LOD score given by the covariate, but also on the linkage evidence present without allowing for the covariate, i.e., based on
. For example, an increase in LOD score of 2 to 3 is more significant than from 0 to 1 because the former is likely to occur by chance (in the absence of covariate effects) only in a linkage peak region, whereas the latter could occur anywhere on the chromosome. An estimate of genomewide significance for a given chromosome, allowing for multiple testing, involves a joint Bonferronitype adjustment for the relative length of the chromosome and the number of covariate tests conducted.
The subclinical phenotypes c, d, and g were indistinguishable in the affected individuals and e, f, and h were all present in the affected individuals and hence provided no useful information for analysis. Therefore, we have carried out 10 covariate analyses on each chromosome (subclinical phenotypes a, b, c, i, j, k, and l, sex, population and parentoforigin). The interaction analyses were carried out between identified peak regions and hence were treated separately.
Results
We found genomewide significant linkage peaks on chromosomes 1 (max Zlr = 4.97 at 177 cM), 3 (max Zlr = 5.58 at 310 cM), 5 (max Zlr = 5.11 at 12 cM) and 9 (max Zlr = 6.04 at 4 cM). On chromosome 1 the peak was narrower with the 3cM SNP map than with the microsatellite map, but this effect was not seen for the other peaks.
The linkage signal on chromosome 1 was found to increase substantially when the subclinical phenotype b was fitted as a covariate in the relative pair covariate linkage analysis, a LOD of 7.07 being increased to 14.29 (chromosomewide p < 0.0001, genomewide p = 0.0097). The linkage evidence appeared to come entirely from the +/+ pairs (IBD = 0.66, compared to 0.49, 0.48 from the /, /+ pairs). A similar effect was found on chromosome 5 with the subclinical phenotype a (LOD increased from 4.90 to 10.05, chromosomewide p < 0.0001, genomewide p = 0.0096), with the linkage coming from the / pairs (IBD_{/} = 0.62, IBD_{/+} = 0.44, IBD_{+/+} = 0.51), and chromosome 10 (at 95 cM) with the subclinical phenotype c (LOD increased from 1.04 to 5.31, IBD_{/} = 0.63, IBD_{/+} = 0.43, IBD_{+/+} = 0.53, chromosomewide p = 0.0004, genomewide p = 0.063). On chromosome 9, the LOD increased from 7.65 to 18.13 with subclinical phenotype k (chromosomewide p < 0.0001, genomewide p = 0.0096), with increased sharing in both the / and +/+ pairs (IBD_{/} = 0.63, IBD_{/+} = 0.42, IBD_{+/+} = 0.67). No genomewide significant effect of subclinical phenotype was observed on chromosome 3. No significant results were obtained for the analyses considering differences in IBD owing to sex, population, parentoforigin, or interactions between the four identified linkage peaks. For each analysis, the maximum LOD score is presented in Table 2.
Discussion
Retrospective comparison of our results with the simulation model showed correct identification of disease loci D15 on chromosomes 1, 3, 5, 9, and 10, respectively. D1 influences phenotypes P1 and P3, which both have subclinical phenotype b, confirmed by the increased sharing we observed in the b+/+ affected pairs. D2 influences all three phenotypes, P13, with one or two of the subclinical phenotype b and c in a somewhat complicated manner. D2 also influences subclinical phenotype k. We observed increased IBD in the k+/+ pairs (chromosomewide p = 0.016), but this was not significant at the genomewide level.
D2 and D3 together help to produce P2 and P3, with D3 also influencing subclinical phenotype a. We detected the association of subclinical phenotype a with D3, finding elevated sharing in the / pairs and decreased sharing in the /+ pairs. D4 is related to P2 through subclinical phenotype c and P3 through b and c. D4 also influences subclinical phenotype k, which we observed through increased IBD sharing in pairs concordant for k.
No interactions were found between loci D14 when examining relative pairs concordantly affected for KPD in general, or even the relevant phenotype (P13). This is because the penetrances of the lowrisk genotype combinations were set to zero, giving a multiplicative model for interactions. Under such models, IBD sharing at one locus is independent of that at the other [2]. The D1â€“D4 interaction could be detected by analyzing affected pairs to which exactly one member had P3 (a negative correlation in IBD was observed at the two loci). However, the D2â€“D3 interaction in P3 and the D1â€“D2 interaction in P1 were not detected by this method, due to the reduced penetrance of the relevant genotypes. Likewise, no linkage evidence was obtained at D6 (a modifying locus that affects the penetrance of phenotype P2), even when affected pairs discordant for P2 were analyzed. These results are consistent with the observation that affected relativepair analysis has low power to detect locuslocus linkage interactions [7].
Conclusion
From analyzing the data blind to the simulation model, there appear to be five susceptibility genes for KPD, located on chromosomes 1, 3, 5, 9, and 10. Those on chromosomes 5 and 10 appear to influence disease only in the absence of subclinical phenotypes a and c/d/g respectively. The locus on chromosome 1 influences disease only in individuals with subclinical phenotype b, whereas that on chromosome 9 appears to have two variants, one giving rise to the presence of subclinical phenotype k in affected individuals, the other to its absence. No subclinical phenotype was found to have a significant genomewide effect on the linkage of KPD to chromosome 3, although k reached chromosomewide significance. Even with knowledge of the simulation model, it was difficult to detect the locuslocus interactions, suggesting that affected relative pairs give little power for such analyses.
Abbreviations
 IBD:

Identical by descent
 KPD:

Kofendrerd Personality Disorder
 SNP:

Singlenucleotide polymorphism
References
Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet. 2000, 25: 1213. 10.1038/75514.
Risch N: Linkage strategies for genetically complex traits. III. The effect of marker polymorphism on analysis of affected relative pairs. Am J Hum Genet. 1990, 46: 242253.
Olson JM: A general conditionallogistic model for affectedrelativepair linkage studies. Am J Hum Genet. 1999, 65: 17601769. 10.1086/302662.
Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlinrapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97101. 10.1038/ng786.
Rice JP: The role of metaanalysis in linkage studies of complex traits. Am J Med Genet. 1997, 74: 112114. 10.1002/(SICI)10968628(19970221)74:1<112::AIDAJMG22>3.0.CO;2T.
Rice JP: Diagnosis as a covariate in sibpair linkage analysis. Am J Med Genet. 2001, 105: 5556. 10.1002/10968628(20010108)105:1<55::AIDAJMG1060>3.0.CO;2D.
Holmans P: Detecting genegene interactions using affected sib pair analysis with covariates. Hum Hered. 2002, 53: 92102. 10.1159/000057987.
Cordell HJ: Sample size requirements to control for stochastic variation in magnitude and location of allelesharing linkage statistics in affected sibling pairs. Ann Hum Genet. 2001, 65: 491502. 10.1046/j.14691809.2001.6550491.x.
Acknowledgements
We gratefully acknowledge funding support from the MRC and the Higher Education Funding Council for Wales.
Author information
Authors and Affiliations
Corresponding author
Additional information
Authors' contributions
All authors contributed to the statistical analysis and interpretation of the data, and to the drafting of this article.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Hamshere, M.L., MacGregor, S., Moskvina, V. et al. Covariate linkage analysis of GAW14 simulated data incorporating subclinical phenotype, sex, population, parentoforigin, and interaction. BMC Genet 6 (Suppl 1), S45 (2005). https://doi.org/10.1186/147121566S1S45
Published:
DOI: https://doi.org/10.1186/147121566S1S45