In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known.
By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study.
We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.
Haplotype analysis is a popular strategy to analyze multiple single nucleotide polymorphisms (SNPs) typed within one candidate gene [1, 2]. Most of the genetic variation of a gene is described by 4 to 8 relatively common haplotypes (frequencies > 0.05). To compare the distribution of observed haplotype frequencies between cases and controls, the global test statistic of Schaid et al.  can be used when haplotype ambiguity exists. When no uncertainty in phase exists, for example in regions with little recombinations, the simple Pearson's chi square statistic is used to test for association. Both statistics test for any difference in haplotype frequencies between cases and controls and the degrees of freedom depend on the number of observed haplotypes. However, researchers often aim to identify one ancestral haplotype . For this situation one may consider a test which summarizes the difference between the null hypothesis and the alternative hypothesis in one parameter, yielding a more powerful one degree of freedom test. This paper aims to derive such a method.
For multi-allelic markers, Terwilliger  proposed a model with one parameter (λ) for the excess frequency of the associated allele in the cases. To test for association he proposed a likelihood ratio test, i.e. comparing the likelihood under the alternative to the likelihood under no association, for testing. Since it is unknown which allele is associated, the likelihood is a weighted sum of conditional likelihoods of the data given that the allele is associated over the observed marker alleles. As for weights Terwilliger  proposed to use the allele frequencies in the population from which the cases and controls were sampled. Thus the most common allele is assigned the highest weight in the analysis. However, for haplotype analysis it is usually not the most common haplotype in the population which is associated to the disease. Therefore in this paper, we propose a model which assigns constant weight to each haplotype. We derive a score statistic from the likelihood of this model.
The log likelihood function of Terwilliger's model also appears to have some unusual statistical features. For example, the allele frequencies that are used as weights are also unknown parameters in the conditional likelihood functions. Maximizing the log likelihood function appears not always straightforward. Further its distribution under the null hypothesis is unknown. Terwilliger proposed to approximate the distribution of the likelihood ratio (TLR) statistic under the null hypothesis with the 50 : 50 mixture of chi square distributions of null and one degrees of freedom. This distribution appeared to yield conservative p-values . Another property of the likelihood function is that the score function, the first derivative of the log likelihood function with respect to λ evaluated at λ = 0 is a constant zero for any observed data for the weights that Terwilliger proposed. Hence the corresponding score statistic does not exist. Score statistics are often preferred over likelihood ratio statistics, because they have often a tractable expression which provides insight in the data and are robust against model deviations . When constant weights instead of allele frequencies as weights are used the score statistic exists. Under the null hypothesis, the statistic follows a normal distribution.
In this paper we derive a method for the extreme situation of phase unambiguity, which means that we study genetic regions with almost no recombination . This assumption is reasonable for many genes, for example fibrinogen alpha, beta and gamma  and CRP . When some uncertainty about phase exists, the p-values should be adapted. Our program  is able to compute valid p-values for the situation of phase ambiguity by permutation of case and control status. The program calls the publicly available software TAGSNPS  to derive the haplotypes in each permutation step (see also Becker et al.  for similar methods).
The methods derived in this paper are meant to detect a common haplotype. In general genetic association studies have power to detect common variants. Thus these methods should detect associations under the common disease-common variant hypothesis. However, it is also possible that a rare causal mutation is sometimes present on the identified haplotype. For example Hylckama Vlieg et al.  detected a common haplotype that was overrepresented in patients with deep venous thrombosis compared to healthy controls. Many homozygous carriers of this haplotype carried the factor V Leiden mutation, the most important genetic risk factor for deep venous thrombosis, which had been discovered before by other methods . Thus if homozygous carriers of the associated haplotype were sequenced the factor V Leiden mutation would have been detected.
We illustrate the method by an association analysis of three candidate genes, fibrinogen alpha (FGA), beta (FGB) and gamma (FGA)in the Leiden Thrombophilia Study (LETS) . In these data, no phase ambiguity was present and the pair of haplotypes could be derived for each individual. The number of common haplotypes varied from 3 to 5. The frequency of the associated haplotype varied from 0.2 to 0.3. In this paper, we consider the score statistic as an alternative to the classical chi-square and the original TLR statistic of  in the context of haplotypes. We also include the likelihood ratio statistic corresponding to the log likelihood using equal weights. We carried out a simulation study to examine the performance of the four statistics under the null hypothesis and to compare the power of the four statistics under various alternatives. Finally we illustrate the proposed statistics by analysis of the haplotype distributions of the FGG, FGB and FGA genes.
By means of a simulation study we compared the performance of the new score statistic with Pearson chi squared χ2 and the Terwilliger's likelihood ratio with weights equal to pj's (TLR). We also considered a likelihood ratio test with equal weights (LR). We first evaluated the type I error rates of the test statistics. For the score statistic we used the chi square distribution with one degree of freedom to approximate the distribution under the null hypothesis. For the LR and TLR statistics we used the 50:50 mixture of two chi squares with zero and one degree of freedom. We generated 10,000 samples of 200 case chromosomes and 200 control chromosomes from the multinomial distributions with probabilities p1⋯pmfor m equal to 4, 5, 8, 10, 15 and 20 haplotypes. Similar to the simulation described by Terwilliger, the frequency of the most common haplotype, p1, was set to 0.5, whereas the remaining haplotypes were equally frequent (0.5/(m - 1)). The results are shown in the left columns of Table 1.
For all m, the type I error rates of the score statistic were maintained at the nominal error rate. For m < 10, the type I error rates of Pearson's chi square corresponded to the nominal level, but became conservative for larger m due to sparse data. For all m considered, the type I error rates for the TLR statistic were conservative (< 0.03). The type I error rates for the LR statistic were also somewhat small (≈ 0.04), but were better than that of the TLR statistic.
To study the power of the statistics, we generated 10,000 samples of 100 case chromosomes and 100 control chromosomes from the multinomial distributions with probabilities p1(1 - λ) + λ and pj(1 - λ) for j = 2⋯m for cases and pjj = 1⋯m for controls, respectively. First, we considered the model used by Terwilliger . The most common haplotype frequency p1 in controls was again set to 0.5 and this haplotype was more frequent in cases. The parameter λ was fixed to 0.5 which corresponds to a haplotype frequency of 0.75 in the cases. The number of haplotypes m was again set to 4, 5, 8, 10, 15 and 20. The results are shown in the right columns of Table 1.
For m < 8, the power of Pearson chi square statistic was good. The performed better than the LR statistic with equal weights. The power of the score statistic was similar to the power of TLR statistic, for m ≤ 8. For m > 8, the TLR statistic had higher power than . For m > 8, the power of the score statistic is smaller, because the weight 1/m is small and the frequencies of the non associated haplotypes are too small, yielding a large variance of the score statistic (see formula 3). Because the LR statistic appeared to perform worse than both and TLR, we did not consider this statistic in the following simulations.
Second, we studied the power of the Pearson's chi square, , and TLR as a function of the excess frequency λ for various values of the frequency of the associated haplotype p1 = 0.1, 0.2, 0.3, 0.4 and 0.5. The remaining haplotypes were again equally frequent. We restricted ourselves to the number of observed haplotypes m of 5 and 8, because most of the genes can be characterized by up to 8 common variants. The parameter λ was varied between 0 and 0.5. The number of chromosomes n1 and n2 were 200. We used a nominal significance level of 0.05. The results are depicted in Figure 1.
For p1 = 0.5, the score statistic and likelihood ratio TLR performed similarly, both better than Pearson's chi square. For m = 5 and p1 = 0.4 or 0.3 and for m = 8 and p1 = 0.4, 0.3 or 0.2, the score statistic performed better than TLR especially for small λ. For p1 = 0.2 and m = 5 all three statistics had similar power. For p1 = 0.1 and m = 5 or m = 8, Pearson's chi-square performed similar to TLR and both statistics performed better than except for λ ≤ 0.1 and p1 = 0.1 and m = 5. Note that for p1 = 0.1, the pooled frequency of the associated haplotype is around and hence the expectation of under the alternative becomes small (see formula 4). For m = 5, the pooled frequency is equal to = 0.2 at λ = 0.2 and for m = 8 the pooled frequency is equal to = 0.125 at λ = 0.06. Thus when the associated haplotype has frequency around , the score statistic loses power.
Especially for common variants with frequency p1 of 0.3 or 0.2 and a small impact on the disease (λ ≤ 0.1), the score statistic performed better than TLR statistic. For m = 5 and m = 8 and p1 = 0.3, the gain in power of the score statistic compared to TLR statistic was about 4% and 8% for λ of 0.05 and 0.1 respectively. For m = 5 and p1 = 0.2, both statistics performed similar. For m = 8 and p1 = 0.2 the gain in power of the score statistic was large, namely 7% and 12% for λ of 0.05 and 0.1 respectively.
We applied the three statistics to a study on haplotype association of fibrinogen alpha (FGA), beta (FGB) and gamma (FGG) with risk of deep venous thrombosis in LETS [15, 16]. Fifteen haplotype tagging SNPs were typed in 474 cases and 474 controls . Within the three genes, the SNPs were in high linkage disequilibrium ( > 0.95). The number of common haplotypes (frequency greater than 5%) describing FGG, FGA and FGB were three, five and five respectively. Since we focus on common haplotypes, we pooled the rare haplotypes with similar haplotypes by dropping rare tagging SNPs. For FGG, we pooled H4 with H1 and for FGB, we pooled H5 with H4 and H7 with H3 (see Uitte de Willige et al.  for more details on the tagging SNPs). In this analysis we considered p-values below 0.05 to be significant. In Table 2 the data are described and the results are given.
For all genes, haplotype H2 appeared to be more frequent in the cases than in the controls. For FGG, FGA and FGB the allelic odds ratios of presence of H2 versus the rest was 1.34, 1.29 and 1.28 respectively. Note that these odds ratios were rather similar while the p-values of the corresponding chi square statistics were different namely, 0.005, 0.034 and 0.090 respectively. The difference in p-values was caused by the difference in degrees of freedom of the chi square statistics and the frequencies of other haplotypes. From the results of the standard chi-square statistics we concluded that only FGG was significantly associated to thrombophilia.
The p-values of the TLR were respectively 0.003, 0.012 and 0.072. These p-values were in line with the estimates of λ, namely 0.08, 0.07 and 0.05 respectively. Since FGA and FGB both had 5 variants, the frequency of the associated haplotype H2 was 0.3 and 0.2 respectively and the λ's were rather small, the score statistic should have more power than TLR for these genes. On the other hand TLR may perform better on the FGG gene, because of the larger difference in frequency between cases and controls and because the frequencies in the pooled set are around 1/m. For this situation, the score statistic has little power.
The p-values of the score statistic were 0.515, 0.003 and 0.049 for FGG, FGA and FGB respectively. Indeed the p-values for FGA and FGB were smaller than the corresponding p-values of TLR statistic and the score statistic had no power for FGA. Based on the results of the score statistic, FGA and FGB were significantly associated with thrombophilia risk.
Although for each subject we were able to derive the pair of haplotypes, we also estimated the p-values of the statistics by 1000 permutations using TAGSNPS to estimate the haplotype counts in each permutation step. The results were similar.
In this report we have derived a new score statistic to test for association between a candidate gene and a binary trait. Score statistics are easy to compute, locally most powerful, and robust to small model deviations under the alternative . From our simulations it appears that under the null hypothesis the statistic follows a chi-square distribution. For candidate genes with a small impact on the disease (λ small), five to eight observed variants, and a frequency of the associated variant in the range of 0.2 to 0.4, the new statistic performs better than the Terwilliger's LR statistic. For larger frequencies (around 0.5) or for a large number of variants (m > 8), Terwilliger's LR statistic appears to perform better, because the weight corresponding to the associated haplotype is larger. Compared to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency larger than 0.2 and the number of observed variants m is larger than 5. Especially when the frequency of the associated haplotype is small (< 0.2), Pearson chi-squared statistic performs similar or better than the score statistic.
We used a mixture of two chi-square distributions to approximate the distribution of Terwilliger's LR statistic under the null hypothesis. It is known that this distribution appears to give conservative p-values . Permutations of the case-control status may result in better p-values. However, permutations are not attractive when no simple formulae for the statistic exists.
When the sampled population is not in Hardy Weinberg equilibrium, the considered statistics are not valid since they assume that the haplotypes of a subject are independent. For Pearson's chi-square statistic on haplotypes, a multi-allelic version of Armitage's trend test on pairs of haplotypes is available [17, 18]. Under Hardy Weinberg equilibrium these statistics are asymptotically equivalent. To derive a one degree of freedom score statistic for pairs of haplotypes, a logistic regression model can be used with the number of associated haplotypes as covariate. Then the total likelihood is the average of the likelihoods over these logistic regression models of possible associated haplotypes. Here more research is needed. In this paper we used a smoothing approach to deal with the fact that the associated haplotype is unknown. As alternative, one may consider taking the maximum over each haplotype. A disadvantage of the latter approach is that permutations are needed to derive a p-value .
In this paper we assumed that the haplotypes could be derived from the typed tagging SNPs. When uncertainty in phase exists the haplotype frequencies have to be estimated using for example an EM algorithm  and the uncertainty has to be taken into account when the p-values are derived. This can easily be done by using permutations. Our program  is able to compute valid p-values in case of ambiguous phase by calling TAGSNPS  to derive the haplotype counts in each permutation step. It may be more efficient to incorporate phase unambiguity in the likelihood function. The gain in efficiency will probably be small. Analyzing genes with little information on phase yields many relatively rare haplotypes and interpretation of the results is often hard. Using permutations to deal with haplotype uncertainty is in the line with recent developments in weighted regression analysis  and in imputation of haplotypes . The new score statistic has no power when the frequency of the associated haplotype in the pooled set of cases and controls is around . This raises the question if a better summary statistic of Pearson's chi square statistic into an one degree of freedom statistic exists. Here more research is needed. As an alternative for the score function, the second derivative of the log likelihood may be used . In another paper we study the performance of this second derivative of the log likelihood function of Terwilliger .
We conclude that by choosing alternative weights, in particular constant weights, in the likelihood of Terwilliger, a set of new powerful and robust statistical tests was derived. For genetic association studies aiming to identify common associated variants, the new statistic should be used in addition to the standard Pearson's chi square statistic. By using both statistics more insight in the data can be obtained. A program  which computes the statistics and corresponding p-values is freely available.
Let m be the number of haplotypes describing most of the genetic variation in a gene. Assume that the haplotype frequencies are in Hardy-Weinberg equilibrium proportions. Let p = (p1,⋯, pm) be the vector of haplotype frequencies in controls. Assume that only one haplotype denoted with index i is over-represented in the cases, then the haplotype frequencies in the cases can be modelled as qi= pi+ λ(1 - pi) and qj= pj- λpjfor j∈ (1,⋯,i - 1, i + 1,⋯,m). Let x = (x1,⋯,xm) and y = (y1,⋯,ym) be vectors of haplotype counts in the cases and the controls, respectively, and let n1 and n2 be the total number of case chromosomes and of control chromosomes, respectively, and n = n1 + n2. Then the conditional likelihood Ligiven that haplotype i carries the mutation is given by
and the likelihood proposed by Terwilliger is equal to
with Ljgiven in formula (1).
It is easy to see that likelihood function (2) can be generalized to the following likelihood function:
with Ljgiven in formula (1) and w = (w1,⋯,wm) a vector of known positive weights restricted by . The first derivative of the log likelihood l(λ, p|x, y, w) = log(L(λ, p|x, y, w)) to λ evaluated in λ = 0 is equal to
For known allele frequencies pj, the distribution of the Uwunder H0 can be approximated by the normal distribution with zero mean and variance . Note that Uw= 0 when for all j wj= pj.
Often the haplotype frequencies are unknown and have to be estimated from the data. Under the null hypothesis we can estimate the frequencies from the combined sample of cases and controls by and an estimate of the score statistic Uwis given by
under H0, has approximately mean equal to zero and variance
Note that the variance is increased by n2/n fold compared to the variance VAR(Uw) because the allele frequencies are estimated from the data. Now the score statistic is defined by
where is obtained by replacing pjby its estimate in formula (3). When all haplotypes are common, a natural choice of weights is .
Under the alternative hypothesis of the presence of one positively associated haplotype i, the expectation of is
with qi= pi+ λ(1 - pi). Note that is the frequency of the associated haplotype in the combined sample. When , the expectation of the score statistic is equal to 0. Hence the score statistic has little power. When is larger than the expectation becomes negative. Therefore we propose a chi-square distribution with one degree of freedom to approximate the distribution of this statistic under the null hypothesis.
Likelihood ratio with equal weights
Terwilliger's likelihood ratio
single nucleotide polymorphism
the Leiden thrombophilia study
Clark AG: The role of haplotypes in candidate gene studies. Genet Epidemiol. 2004, 27: 321-33. 10.1002/gepi.20025.
Stram DO, Haiman CA, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Pike MC: Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered. 2003, 55: 27-36. 10.1159/000071807.
Uitte de Willige S, de Visser MC, Houwing-Duistermaat JJ, Rosendaal FR, Vos HL, Bertina RM: Genetic variation in the fibrinogen gamma gene increases the risk of deep venous thrombosis by reducing plasma fibrinogen gamma' levels. Blood. 2005, 106: 4176-83. 10.1182/blood-2005-05-2180.
Carlson CS, Aldred SF, Lee PK, Tracy RP, Schwartz SM, Rieder M, Liu KA, Williams OD, Iribarren C, Lewis EC, Fornage M, Boerwinkle E, Gross M, Jaquish C, Nickerson DA, Myers RM, Siscovick DS, Reiner AP: Polymorphisms within the C-reactive protein (CRP) promoter region are associated with plasma CRP levels. Am J Hum Genet. 2005, 77: 64-77. 10.1086/431366.
Becker T, Cichon S, Jonson E, Knapp M: Multiple testing in the context of haplotype analysis revisited: application to case-control data. Ann Hum Genet. 2005, 69: 747-56. 10.1111/j.1529-8817.2005.00198.x.
Van Hylckama Vlieg A, Sandkuijl LA, Rosendaal FR, Bertina RM, Vos HL: Candidate gene approach in association studies: would the factor V Leiden mutation have been found by this approach?. Eur J Hum Genet. 2004, 12: 478-82. 10.1038/sj.ejhg.5201183.
Bertina RM, Koeleman BP, Koster T, Rosendaal FR, Dirven RJ, de Ronde H, van der Velden PA, Reitsma PH: Mutation in blood coagulation factor V associated with resistance to activated protein C. Nature. 1994, 369: 64-67. 10.1038/369064a0.
Cordell HJ: Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures. Genet Epidemiol. 2006, 30: 259-75. 10.1002/gepi.20142.
El Galta R, Hsu L, Houwing-Duistermaat JJ: Methods to test for association between a disease and a multi-alleleic marker applied to a candidate region. BMC Genet. 2005, 6S: 101-10.1186/1471-2156-6-S1-S101.
REG participated in method development, wrote a C code, carried out the simulation study and data analysis, participated in interpreting results, and in drafting of the manuscript. SUdeW and MCHdeV provided the data and participated in interpreting the results. QH designed the web interface and extended the C code to account for phase uncertainty. LH participated in method development and in interpreting results. JJH-D participated in method development and in interpreting results, and participate in drafting of the manuscript. All authors read and approved the final version.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License (
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
el Galta, R., Uitte de Willige, S., de Visser, M.C. et al. Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association.
BMC Genet8, 63 (2007). https://doi.org/10.1186/1471-2156-8-63