Skip to main content

Analysis of alcohol dependence phenotype in the COGA families using covariates to detect linkage


Linkage analysis methods that incorporate etiological heterogeneity of complex diseases are likely to demonstrate greater power than traditional linkage analysis methods. Several such methods use covariates to discriminate between linked and unlinked pedigrees with respect to a certain disease locus. Here we apply several such methods including two mixture models, ordered subset analysis, and a conditional logistic model to genome scan data on the DSM-IV alcohol dependence phenotype on the Collaborative Studies on Genetics of Alcoholism families, and compare the results to traditional nonparametric linkage analysis. In general, there was little agreement among the various covariate-based linkage statistics. Linkage signals with empirical p-values less than 0.001 were detected on chromosomes 3, 4, 7, 10, and 12, with the highest peak occurring at the GABRB1 gene using the ecb21 covariate.


Etiological heterogeneity is inevitable when large sets of pedigree data are analyzed for complex diseases, where the susceptibility loci may vary from one pedigree to another. Such heterogeneity, if unrecognized, tends to reduce the power to detect linkage. Covariate-based methods attempt to adjust for heterogeneity by using covariate data to discriminate between pedigrees with different disease etiologies; however, since these methods are relatively new, few studies have applied them to real datasets [13]. The most comprehensive investigation comparing these methods is an extensive simulation under different gene × environment interaction models performed by Tsai [4]. The Collaborative Studies on Genetics of Alcoholism (COGA) [5] family dataset provides an opportunity to apply covariate-based methods because it contains several biologically meaningful covariates of the alcoholism phenotype.

In this study, we applied four covariate-based methods to the COGA families from the Genetic Analysis Workship 14 dataset. Our aim was to identify new genes responsible for alcoholism, as well as to study whether previously detected regions of linkage were also detected using these new methods. The methods included the pre-cluster and covariate-identity by descent (cov-IBD) models of Devlin et al. [6], ordered subset analysis (OSA) of Hauser et al. [7], and the conditional logistic regression model of Olson [8] implemented within the LODPAL program of S.A.G.E. [9]. The results were compared to traditional nonparametric linkage analysis using GENEHUNTER-PLUS [10]. We used simulation to estimate the significance of our linkage signals empirically.


Covariate-based linkage analysis methods

One class of models assumes that a proportion of the pedigrees are linked to the disease gene, while the remaining pedigrees are affected due to some other reason. Membership in the linked group is predicted using one or more covariates assumed to be related to the disease. The pre-cluster, cov-IBD, and OSA models fall into this category. Regression-based models that condition on the covariate values are a second category of heterogeneity-based methods, Olson's method being an example.

Pre-cluster and cov-IBD by Devlin et al. are mixture models that analyze affected sib-pair (ASP) data [6]. Each ASP is assigned a pair-specific covariate value. Linkage at a marker is detected by maximizing the likelihood as a function of the probability, α, of each sib pair being in the linked group and its IBD proportions. Pre-cluster determines α by clustering on the covariates prior to testing for excess IBD sharing, while cov-IBD uses both the covariates and IBD information to determine α while simultaneously testing for linkage.

OSA determines the ordered subset of the pedigrees that provides maximal evidence for linkage [11]. Each pedigree is assigned an overall pedigree-level covariate value, and pedigrees are then ranked in increasing or decreasing order of their covariate values. The OSA statistic is the maximum of the LOD scores over the ordered subsets. The advantage of OSA is that a priori specification of the linked and unlinked subsets is not required; however, it ignores the magnitude of the covariate values, considering only the rank.

Olson's method uses a conditional-logistic representation of an affected relative pair (ARP) likelihood ratio that includes the effects of covariates as additional parameters in a test for linkage [8]. This model allows for the inclusion of pair-wise covariates and is valid for any type of ARP. The model assumes a multiplicative effect of the covariate on the genetic relative risk, and can be used to test whether the covariate contributes significant information about linkage in a region where linkage is known to exist.

Phenotypes and covariates

The DSM-IV alcohol dependence phenotype (ALDX2) [12] was recoded into a binary disease phenotype. Subjects having the affected phenotype were maintained as affected, those having no information were recoded as unknown, and everyone else with a known phenotype was coded as unaffected.

We selected four quantitative phenotypes including two electrophysiological measurements as possible covariates: 1) age of onset for alcohol dependence, 2) number of packs of cigarettes per day for a year, CIGPKYRS, 3) Visual Oddball experiment data for the target case from the far frontal left side channel, ttth1 and 4) data from the Eyes Closed Resting electroencephalogram experiment, ecb21. Age of onset and ecb21 were selected because they divided up the affected sib pairs into noticeable clusters (data not shown), which is necessary for the cov-IBD and pre-clustering methods to work well [4]. Clustering was performed using the mclust [13, 14] function of R. The ttth1 phenotype was selected because it has been linked to known regions on the genome [15, 16] on the COGA families. The CIGPKYRS phenotype was selected as evidence of tendency to substance abuse.

For pre-cluster, mclust was used to cluster the set of affected sibling-pairs simultaneously on two dimensions: minimum of affected's phenotype and maximum of affected's phenotype, over the entire pedigree containing that pair. Before clustering, we standardized each set of covariate values by subtracting the mean and dividing by the standard deviation of the sample in order to enhance numerical stability. By our clustering scheme, membership of each pedigree to either cluster is determined by Xped, where:

Xped = √[(min of affected's phenotypes)2 + (max of affected's phenotypes)2].

The two clusters were designated as G1 and G2, based on the Euclidian distance of their centres from the origin, G1 representing the nearer cluster. Because OSA allows for only one covariate per pedigree, we assigned Xped values as pedigree-level covariates prior to running OSA. We ran LODPAL on affected sib pairs using both the sum and the difference of each pair's covariate values, reporting the best score. The multiple-testing issue arising in this case was taken care of by the empirical p-value calculation.

Linkage analysis

We used all 143 multiplex pedigrees and the 315 microsatellite markers located on chromosomes 1 through 22. Our analysis was not appropriate for X-linked data. Due to software limitations, the seven largest pedigrees were broken into smaller components or trimmed of uninformative individuals, resulting in 156 pedigrees overall. Multipoint IBD probabilities were obtained using MERLIN version 0.10.2 [5] for use within LODPAL and pre-cluster at each marker, and four equally spaced intermediate positions. Multipoint nonparametric linkage analysis was peformed using GENEHUNTER-PLUS based on the Sall statistic. MEGA2 [17] was used to set up files for MERLIN, GENEHUNTER-PLUS, and LODPAL.

For pre-cluster, we computed two likelihoods: with G1 as the linked cluster, and with G2 as the linked cluster. Similarly, for OSA, we used both orderings of the Xped values: L2H, ordered small to large so that linked pedigrees have smaller covariate values; H2L, ordered large to small, so linked pedigrees have larger covariate values. Marker positions are reported by using Haldane map function. The chromosomal locations of the genes that were not included in the COGA marker map were obtained from the Marshfield web site and converted to Haldane map distances.

Empirical significance

A small region on chromosome 7 spanning 27–61 cM was selected for determining the empirical significance of LOD scores obtained from the various covariate methods. We simulated 1,000 replicates of the genotype data using SIMULATE [18] under the hypothesis of no linkage while keeping the pedigrees and covariates constant. The genotype data were then analyzed by each method, with each of the four covariates. The simulated LOD scores for all of the markers were pooled to create the empirical null distributions for each covariate and method. The validity of pooling markers is discussed in [4].


The NPL analysis produced a single peak with LOD score 2.68 at D10S544 on chromosome 10. We have not reported cov-IBD results because these were not significantly different from the pre-cluster results. Table 1 contains the top three significant results for pre-cluster, OSA, and LODPAL. OSA produced elevated LOD scores for all covariates in the region found by nonparametric linkage analysis as did pre-cluster using the ttth1 covariate (results not shown). The highest peak for LODPAL is at the GABRB1 gene that has been identified previously as being linked to alcoholism [15]. The OSA peak at D7S2846 is within 22 cM of the NPY2 gene, and the peak on chromosome 11 for the pre-cluster model lies within 20 cM of the DRD2 gene. Although association between specific variants of the DRD2 gene and alcoholism has been noted previously, no linkage study has detected alcoholism genes in this region. LODPAL found a suggestive linkage peak on chromosome 6 at 142 cM with LOD score 3.09 using the age on onset as covariate, which is close to the ALDH8A1 gene, as well as the GRK1 gene.

Table 1 Most significant peaks for LODPAL, OSA, and pre-cluster

Except for one region on chromosome 21 (Figure 1), which showed consistently elevated LOD scores for all methods using the ecb21 covariate, there were no peaks in common across methods. Using the ttth1 covariate, chromosome 10 showed elevated LOD scores for all three methods, but in different regions (Figure 1). There was little commonality between subsets produced by OSA and the linked clusters produced by pre-cluster, for the six peaks listed in Table 1 for these two methods, or for the chromosome 10 peak (comparisons not shown). The 99th percentiles of the empirical null distribution of LOD scores for pre-cluster range between 1.17 and 1.34 for the four covariates; 99th percentiles for OSA are between 1.86 and 2.09; LODPAL's 99th percentile range from 1.99 to 2.24.

Figure 1
figure 1

Plot of -log10(p-value) showing examples of agreement and disagreement between the three covariate based methods.


Our covariate selection was rather heuristic, based on evidence from clustering, rather than biological reasons. Ideal candidates for covariate statistics would be risk factors with a gene × environment interaction effect and identifying such factors requires prior biological knowledge. A purely environmental risk factor would act as a confounder, reducing the power of the mixture model because it cannot cluster families into linked and unlinked groups. However, it is a challenging issue to determine which of the above classes a covariate falls into, and this bears further investigation within a systematic framework. We would also expect that the choice of the function for creating pedigree-level covariates from individual values would have an effect on the analysis. Indeed, when we used the mean value of the affecteds instead of our Xped values, LOD scores were noticeably lower (results not shown). The lack of agreement among the results may be also be due to the sensitivity of the covariate-based methods to the relationship between the covariate and trait under study.

Tsai [4] observed previously that the thresholds for significance tend to be greater for the conditional-logistic model than for the mixture model (1.7 vs. 1.2 for at the 0.01 level). Our investigation supports her observations, although the conditional-logistic model threshold appeared to be higher than her findings. Because the theoretical distributions for the test statistics of the conditional logistic model, OSA, and cov-IBD are approximations, in order to make direct comparisons between the methods we recommend using an empirical distribution of the LOD scores.



Affected relative pair


Affected sib pair


Collaborative Study on the Genetics of Alcoholism


Covariate identity by descent


Identity by descent


Ordered subset analysis


  1. Devlin B, Bacanu SA, Klump KL, Bulik CM, Fichter MM, Halmi KA, Kaplan AS, Strober M, Treasure J, Woodside DB, Berrettini WH, Kaye WH: Linikage analysis of anorexia nervosa incorporating behavioral covariates. Hum Mol Genet. 2002, 11: 689-696. 10.1093/hmg/11.6.689.

    Article  CAS  PubMed  Google Scholar 

  2. Goddard KAB, Witte JS, Suarez BK, Catalona WJ, Olson JM: Model-free linkage analysis with covariates confirms linkage of prostate cancer to chromosomes 1 and 4. Am J Hum Genet. 2001, 68: 1197-1206. 10.1086/320103.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  3. Hill S, Shen S, Zezza N, Hoffman E, Perlin M, Allan W: A genome wide search for alcoholism susceptibility genes. Am J Med Genet. 2004, 128B: 102-113. 10.1002/ajmg.b.30013.

    Article  PubMed Central  PubMed  Google Scholar 

  4. Tsai H-J, Weeks DE: Comparison of methods incorporating quantitative covariates into affected sib-pair linkage analysis. Genetic Epidemiology.

  5. Begleiter H, Reich T, Hesselbrock V, Porjesz B, Li T-K, Schuckit M, Edenberg H, Rice J: The Collaborative Study on the Genetics of Alcoholism. Alcohol Health Res World. 1995, 19: 228-236.

    Google Scholar 

  6. Devlin B, Jones BL, Bacanu S-A, Roeder K: Mixture models for linkage analysis of affected sibling pairs and covariates. Genet Epidemiol. 2002, 22: 52-65. 10.1002/gepi.1043.

    Article  PubMed  Google Scholar 

  7. Hauser ER, Watanabe RM, Duren WL, Bass MP, Langefeld CD, Boehnke M: Ordered subset analysis in genetic linkage mapping of complex traits. Genet Epidemiol. 2004, 27: 53-63. 10.1002/gepi.20000.

    Article  PubMed  Google Scholar 

  8. Olson JM: A general conditional-logistic model for affected-relative pair linkage studies. Am J Hum Genet. 1999, 65: 1760-1769. 10.1086/302662.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Elston R, Bailey-Wilson J, Bonney G, Tran L, Keats B, Wilson A: Statistical analysis for genetic epidemiology (S.A.G.E.). Release 5.0. 2002, Cleveland, OH: Rammelkamp Center for Education and Research, Metro Health Campus

    Google Scholar 

  10. Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996, 58: 1347-1363.

    PubMed Central  CAS  PubMed  Google Scholar 

  11. Morton NE: Sequential tests for the detection of linkage. Am J Hum Genet. 1955, 7: 277-318.

    PubMed Central  CAS  PubMed  Google Scholar 

  12. Association AP: Diagnostic and Statistical Manual of Mental Disorders (DSM-IV). 1994, Washington, DC: American Psychiatric Association, Fourth

    Google Scholar 

  13. Fraley C, Raftery AE: MCLUST: Software for model-based cluster analysis. J Classification. 1999, 16: 297-306. 10.1007/s003579900058.

    Article  Google Scholar 

  14. Fraley C, Raftery AE: Model-based clustering, discriminant analysis and density estimation. JASA. 2002, 97: 611-631.

    Article  Google Scholar 

  15. Dick DM, Foroud T: Candidate genes for alcohol dependence: a review of genetic evidence from human studies. Alcohol Clin Exp Res. 2003, 27: 868-879. 10.1097/01.ALC.0000065436.24221.63.

    Article  PubMed  Google Scholar 

  16. Porjesz B, Begleiter H, Reich T, Van Eerdewegh P, Edenberg HJ, Foroud T, Goate A, Litke A, Chorlian DB, Stimus A, Rice J, Blangero J, Almasy L, Sorbell J, Bauer LO, Kuperman S, O'Connor SJ, Rohrbaugh J: Amplitude of visual P3 event-related potential as a phenotypic marker for a predisposition to alcoholism: preliminary results from the COGA project. Alcohol Clin Exp Res. 1998, 22: 1317-1323. 10.1111/j.1530-0277.1998.tb03914.x.

    CAS  PubMed  Google Scholar 

  17. Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE: Mega2, a data-handling program for facilitating genetic linkage and association analyses [abstract]. Am J Hum Genet. 1999, 65: A436-

    Google Scholar 

  18. Terwilliger J, Speer M, Ott J: Chromosome-based method for rapid computer simulation in human genetic linkage analysis. Genet Epidemiol. 1993, 10: 217-224. 10.1002/gepi.1370100402.

    Article  CAS  PubMed  Google Scholar 

Download references


BHR is supported by the NIMH training grant "Discovering Genes for Mental Health" (5T32MH020053-05), NM is supported by NIMH grant 5R01MH064205-07. H-JT is supported by the Sandler Centre for Basic Research in Asthma. The S.A.G.E. software is supported by U.S. Public Health Resource grant RR03655 from the National Centre for Research Resources. The OSA software is available at, the pre-cluster software is available at, and MEGA2 is available at

Author information

Authors and Affiliations


Corresponding author

Correspondence to Brian H Reck.

Additional information

Authors' contributions

BHR, NM, and H-JT contributed equally to the data processing, analysis, and the writing of the manuscript. DEW contributed to the design of the study and writing of the manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Reck, B.H., Mukhopadhyay, N., Tsai, HJ. et al. Analysis of alcohol dependence phenotype in the COGA families using covariates to detect linkage. BMC Genet 6 (Suppl 1), S143 (2005).

Download citation

  • Published:

  • DOI: