Intersection tests for single marker QTL analysis can be more powerful than two marker QTL analysis

Background It has been reported in the quantitative trait locus (QTL) literature that when testing for QTL location and effect, the statistical power supporting methodologies based on two markers and their estimated genetic map is higher than for the genetic map independent methodologies known as single marker analyses. Close examination of these reports reveals that the two marker approaches are more powerful than single marker analyses only in certain cases. Simulation studies are a commonly used tool to determine the behavior of test statistics under known conditions. We conducted a simulation study to assess the general behavior of an intersection test and a two marker test under a variety of conditions. The study was designed to reveal whether two marker tests are always more powerful than intersection tests, or whether there are cases when an intersection test may outperform the two marker approach. We present a reanalysis of a data set from a QTL study of ovariole number in Drosophila melanogaster. Results Our simulation study results show that there are situations where the single marker intersection test equals or outperforms the two marker test. The intersection test and the two marker test identify overlapping regions in the reanalysis of the Drosophila melanogaster data. The region identified is consistent with a regression based interval mapping analysis. Conclusion We find that the intersection test is appropriate for analysis of QTL data. This approach has the advantage of simplicity and for certain situations supplies equivalent or more powerful results than a comparable two marker test.


Background
Many authors [1][2][3][4] have compared the statistical power of different methods for quantitative trait locus (QTL) mapping. These comparisons have shown that the additional information supplied by the genetic distance between identifiable DNA markers unconfounds the QTL effect from the QTL location, thus making two marker models more powerful than single marker models of detection (for review see Doerge et al. [5]).
With the goal of detecting and/or locating QTL, there are two common statistical approaches that can be taken. The first approach is based on ANOVA, or simple linear regression, and performs statistical tests based solely on single DNA marker information. No genetic map is required for single marker analysis, and the calculations are based on phenotypic means and variances within each of the genotypic classes. The second, and more involved approach, is based on two DNA markers, the estimated recombination fraction between them (or the estimated genetic distance), and either a maximum likelihood based calculation or a regression model including multiple (two or more) markers as independent variables. The linear ordering of multiple DNA markers based upon their estimated relationships (i.e., recombination fraction (or genetic distance)) supplies the framework or genetic map for (composite) interval mapping [6,7] and as such unconfounds the QTL effect and the QTL location, thus providing a more precise means for detecting and locating QTL with respect to the estimated genetic map for the organism under investigation.
Rebaï et al. [4] present a comprehensive comparison of the statistical power for many of the commonly used flanking marker or two marker methods employed, and conclude that two marker mapping provides a relatively small gain (5%) in power over single marker methods when the two markers define an interval of width less than 20 cM, but a substantial increase (greater than 30%) in power for intervals upwards of 70 cM, indicating that the gain in power may come from the addition of the second marker to the analysis, or the addition of information from that marker, rather than the map.
Using the findings of Rebaï et al. [4], and others as our motivation, we hypothesize that the power increase between single marker and two marker regression methods is due to additional genotypic information in the second marker. In order to assess this, similar test statistics should be compared. A comparison of maximum likelihood interval mapping to single marker ANOVA is complicated because of the differences that may be observed due to differences between regression and maximum likelihood [3], as well as differences in marker information. In order to avoid this complication, we consider two regression based approaches that differ only in the number of markers included in the initial model (i.e. the statistical methodology is the same, the models are different). First, a set of compound hypotheses are defined for use with a single marker analysis and used to define an intersection test. We then state the equivalent hypotheses for a regression based two marker model for a test of the interval. Last we compare, via simulation, the power of the intersection and the two marker test under the stated hypotheses for each model. We do not consider the case of multiple QTL in a single interval, as this case was not considered by Rebaï et al. [4]. These two approaches are then applied to a 'backcross' population of Drosophila with 76 informative markers [8] to detect QTL associated with ovariole number.

Simulations
An overview of the simulations performed is given in Table 1 and 2. For the purpose of evaluating the relative difference in statistical power between the intersection test and the two marker test, the estimated statistical power of the two marker test was subtracted from that of the intersection test for each of the parameter combinations investigated, and a t-test [9] was performed to test the null hypothesis that the mean difference in power was zero.
For the binomial phenotype with a backcross design, sample size of 100, the 100 parameter combinations examined resulted in 34 showing no difference in statistical power between the intersection test and the two marker test (i.e, the value of the difference was exactly zero), 39 favoring the intersection test, and 27 favoring the two marker test. The intersection test was more powerful with a mean difference in power of 0.010, and the t-test of the null hypothesis that the difference in power was zero was rejected (p = 0.020). When n was equal to 500, the 139 parameter combinations examined yielded 116 showing there was no difference in statistical power, 18 parameter combinations indicated the intersection test as more powerful, and 5 indicated the two marker test as more powerful. The mean difference in power was 0.020, and the t-test of the null hypothesis that the difference in power was zero was rejected (p = 0.010). The test of the null hypothesis that the mean difference between the intersection and two marker was zero was rejected for both sets of simulations. The estimated difference between the two approaches was positive, indicating that the intersection test has slightly higher power than the two marker test in these cases.
We also investigated F 2 experimental populations for a binomial phenotype, using a sample size of 500. From the 25 parameter combinations investigated, 5 failed to converge consistently for the two marker model due to singularity in the design matrix. The remaining 20 parameter combinations showed 10 as having no difference in statistical power, while the remaining 10 favored the intersection test. Results similar to those found in the initial simulations indicate that the intersection test performs as well as, or better than the two marker test. Comparisons with smaller sample sizes (n = 100) were not conducted because of convergence problems using the two marker model.
For a normally distributed phenotype, 75 parameter combinations for the backcross were examined. From these, 12 showed no difference in statistical power, 34 scenarios favored the intersection test, while 29 indicated the two marker test as more powerful. The mean difference in power was 0.015, and we failed to reject the test of the null hypothesis that the difference was zero, p = 0.064.
Upon investigation of the parameter combinations that showed some difference in power, specifically, for the scenario highlighted by Rebaï et al. [4] (QTL in the middle of the interval with a large distance between markers), we also find the two marker approach to be slightly more powerful than the intersection test. However, when the distance between the QTL and one marker is much smaller than the distance between the QTL and the second marker, the intersection test is more powerful. Although we can point to these cases, it is important to realize that for most of the scenarios no difference in power was observed (see Figure 1).

Drosophila Analysis
Ovariole number is related to reproductive success in Drosophila melanogaster and positively correlated with maximum daily female fecundity [10,11]. The 98 RILs (recombinant inbred lines) for this study were scored for the trait ovariole number and genotyped as described in Wayne et al. [8].
Histogram and density plot of difference in power between intersection test and two marker test for 334 simulations Figure 1 Histogram and density plot of difference in power between intersection test and two marker test for 334 simulations. For the 71 marker pairs considered, 36 markers were found to be significant using both the intersection test and the two marker test, and 26 were found to be non-significant with both tests. The intersection and two marker tests were concordant in 62 of the 71 pairs of markers ( Table 3). The estimated chance corrected agreement (Kappa coefficient) was 0.75 with a 95% confidence inter-val of (0.58,0.90). McNemar's test showed no systematic difference in the two approaches (S = 1.00, p = 0.32).
Overlapping regions on chromosome 3 were identified by the intersection test, the two marker test, and the interval mapping test (Table 4). On chromosome 1, the two marker test identified a region between 3E and 7D while  The application of the intersection test to these data can be further expanded to include an analysis with all 76 markers. We conducted 76 single marker regression tests at a Bonferroni adjusted α of 6.6 × 10 -4 . Markers 68B, 71E, 73D, 76A, 82D, 99A, 99B, 99E and 100A were significant using this intersection test (see Figure 2). The regions identified are consistent with a regression-based interval mapping analysis.

Discussion
The findings of Rebaï et al. [4] show differences in the statistical power of the two marker methods (i.e., interval mapping) over single marker tests (e.g., ANOVA, t-test) only when the markers are more than 50cM apart, suggesting that these differences may be due to the addition of information in the second marker. Our simulation study supports this hypothesis.
The application of an intersection test uses information from both markers, and tests the same null hypotheses as the two marker test. The use of the intersection test takes advantage of the additional genotypic information provided in the second marker.
While compound hypotheses are common in statistical theory, and typically seen in the use of union/intersection tests, their use in the quantitative genetics arena and QTL application is relatively novel. Furthermore, the intersection test is simple to implement, the expansion to multiple markers is straightforward, and uses all available marker information. In a framework map, where markers are unlinked, the intersection test is simply the single marker analysis with a Bonferroni correction for the significance level. In cases where markers are correlated, the application of the Bonferroni correction will be overly conservative. This correction guarantees that the nominal α is not exceeded, but is well known to be overly conservative in cases where tests are not independent. In this case, the application of the intersection test will require an Drosophila melanogaster intersection test alternative correction in order to achieve maximum power.
We demonstrate situations for a pair of adjacent markers where the power of the intersection test is equal to or greater than the power of the two marker test. In the case highlighted by Rebaï et al. [4], markers more than 50 cM apart and large effect size, we also find that the two marker test has higher power than the intersection test. A counter example is when one marker is much closer to the QTL than the other marker, in this case the intersection test is more powerful. Overall, the power of the two approaches is nearly identical and differences between them small.
In the Drosophila reanalysis both methods identify the same general regions. However, six marker pairs were found to be significant using the two marker tests that were not identified using the intersection test. Using the map and notation defined by Nuzhdin et al. [12] they were: Chromosome 1 pairs 3E-4F, 4F-5D, 6E-7D; Chromosome 2 pairs 38E-43A, 43A-43E, and 43E-46C. The pvalues for the 6 marker pairs from the intersection tests were small but did not exceed the Bonferroni corrected significance level (see Table 4). The above markers that contribute to these marker pairs are linked indicating that the Bonferroni correction may be overly conservative.
In contrast, three marker pairs on Chromosome 3 were significant using the intersection tests, but were not significant using the two marker tests: 63A-65D, 87F-88E, and 96A-96F (Table 4). In these cases the "internal" marker of the pair is giving signal while the "outer" marker does not. This provides an interesting point of discussion. We could say that marker 65D appears to be associated with ovariole number, but we do not know if the QTL lies to the left or right of this marker. Just because marker 63A does not appear to be significantly associated with ovariole number, we can not infer that the region to the "left" of 65D does not contain the gene of interest.
In some cases, the two marker test results in a larger region than the intersection test, while in others the reverse is true. QTL mapping is usually a first attempt to locate genes, which the biologist uses to identify all possible regions of interest, e.g. is willing to accept type I error. We have discussed different ways to detect underlying QTL and an approach for maximizing or minimizing the potential region containing the QTL. It is also possible to estimate the QTL position directly. Estimates can be obtained using a variety of techniques and the different possible approaches to estimation are reviewed in Doerge et al. [5] and Kao [3]. However, even when the position is estimated, a confidence interval will exist defining the size of the region to be included for further study. Different approaches will result in regions of different sizes with more, or fewer markers included. The differences in the size of the regions are potentially important to a biologist, who relies on QTL mapping analyses to determine regions for further study. Most biologists accept that current QTL mapping methods are best used for identifying broad regions, which subsequently can be dissected with more precise genetical techniques. The question is then: what region should be advanced to fine mapping experiments?
The investigator may choose to take only the regions which are significant in both intersection and multiple marker approaches, or s/he may choose to carry forward any marker that shows a positive result according to at least one analysis. We recommend that experimentalists perform both a single marker analysis with an intersection test and a multiple marker analysis and use the information available in both analyses to guide their decisions about what regions to carry forward for further study.

Conclusion
We find that the intersection test has equal or greater power compared to the two marker equivalent. Our analyses were conducted using the Bonferroni correction for the intersection test. When markers are linked, as in many of our simulations, this correction is overly conservative. If the intersection test is used in conjunction with a more appropriate correction, the performance of the intersection test would improve perhaps even surpassing the two marker equivalent in more cases. Thus, our motivation and hope in presenting this investigation of the statistical power of intersection tests versus two marker tests is to make clear the compound framework and resulting evidence under which intersection tests are indeed equal to and/or more powerful than the complicated procedures based on two marker models.

Statistical Framework
As the framework for our comparison, and in conjunction with the previous simulations and conclusions provided by the work of Rebaï et al. [4], we consider a backcross experimental design originating from a cross of two homozygous inbred lines, differing in the trait of interest, and producing heterozygous lines that are backcrossed to one of the initial homozygous parental lines. We examine both normal and binomial phenotypic distributions. In general, we denote each marker as

Single Marker Model and Hypotheses
A simple linear regression backcross model is employed for single marker QTL detection where Y j is the quantitative trait value, X *j is an indicator variable that denotes the state of a particular marker, β 0 is the overall mean, and β * is the effect of an allelic substitution at the marker. Ideally, if the marker and QTL are completely linked, the effect of an allelic substitution is the effect of the QTL. If k markers are considered independently, k linear regression models can be considered (i.e., one for each marker, A compound hypothesis testing the effect of an allelic substitution at either or both of these two independent markers is, Rejection of this compound null hypothesis indicates an association between a QTL and either or both of the markers, M 1 and M 2 , hence the term intersection test. From a statistical perspective the relative position of the two markers is irrelevant. However, to compare this to a two marker model there is an implicit assumption that the markers considered form an interval, or are adjacent to one another. This marks a departure from the traditional single marker analysis where no consideration to marker order is given. To define an overall level α test, the significance level α must be adjusted for the individual tests to account for multiple testing. There are many ways to account for multiple testing. Assuming the markers are independent, the Bonferroni correction can be applied [9]. The Bonferroni correction is conservative for the intersection test and the lack of independence between markers would tend to make it more difficult for the intersection test to reject.
More generally, for k markers, the compound hypothesis testing the effect of an allelic substitution at any of the independent markers, M 1 ...M k is Rejection of this compound null hypothesis indicates an association between a QTL and at least one of the markers, M 1 ...M k . To define an overall level α test, using a Bonferroni correction [9], each β * is tested at an adjusted significance level of . An association between a QTL and a marker is then indicated when the individual single marker test rejects the null at the adjusted α level.
The practical result of the application of an intersection test, is the simplicity of calculation of the single marker test statistic, with a correction for multiple testing.

Two Marker Regression Model and Test of the Corresponding Interval
Extending the (backcross) notation defined previously, a multiple linear regression model (based on two markers) can be employed for QTL detection purposes. The model is defined as where X 1j and X 2j are the genotypic states of the respective markers M 1 and M 2 , along with their respective allelic substitution effects (β 1 , β 2 ), and X 3j is the combined genotypic states of markers M 1 and M 2 with allelic substitutions at both markers M 1 and M 2 having effect β 3 . Interestingly to note, when one is selectively genotyping, the information in β 3 is maximized.
In other words, association between a trait and the marker loci M 1 and M 2 is the test of β 3 where, The null hypothesis for this test is that there is no association between either marker (M 1 or M 2 ) and the trait. A similar set of hypotheses follow for an F 2 experimental design.
This model parameterization differs from the least squares interval mapping approach first introduced by Knott and Haley [2]. In the parameterization proposed here, only one test is performed for the pair of markers. In contrast, the regression based interval mapping approach [2], recalculates the value of the independent variables for each putative position in the interval. Our two marker regression has a different parameterization from Knott and Haley [2]. We chose the alternate parameterization in order to directly compare the two marker model and the single marker model. In the Knott and Haley [2] parameterization, flanking markers are used to define the coefficients of the regression as mean, additive or dominance effects. For s steps along the interval between two markers M 1 and M 2 values of X are calculated according to the conditional probability of a QTL in that location.
The regression based interval mapping parameterization thus provides a mechanism to test for additive and dominance effects using tests of the regression parameters. In our parameterization, the regression coefficients are tests for detection. Thus, the two parameterizations have different null hypotheses for the tests of the regression coefficients and are not directly comparable in terms of power. We use the alternative parameterization so that the interpretation of the tests is comparable in the single marker and two marker regression models and we can directly compare the power of the two tests.

Simulations
Data were simulated for two marker backcross and F 2 populations with binomial trait distributions and two marker backcross populations with normal trait distributions. A total of 339 parameter combinations were examined (Table 1). For each combination of parameters, 1000 data sets were simulated. Traits were simulated from a binomial distribution Bin(n,p) where sample sizes n = 100 and n = 500 were utilized, and from a normal distribution N( , 1 . 0 ) w i t h n = 500. The effect of the binary trait [13] varied based on µ = np i ( Table 1). The binomial probabilities p 1 , p 2 , and p 3 represent the probability that a binary trait is present given a specific BTL genotype (GT), or the penetrance of the trait for the specific genotypes Q 1 / Q 1 , Q 1 /Q 2 , and Q 2 /Q 2 , respectively. The location of the locus relative to marker loci M 1 and M 2 also varied. Similarly, the effect under the normally distributed phenotype was allowed to vary (Table 2) under seventy five parameter combinations. The effect size is the difference in the penetrances (for binary traits) and between the means (for normally distributed traits). For each phenotypic trait distribution and each parameter combination (Table 1 and 2) we analyzed, via least squares, 1000 simulated data sets using both the single marker regression model and the two marker regression model.
For the intersection test, the null hypothesis was rejected when the empirical p-value for either single marker regression test statistic was less than = 0.025 (Bonferroni adjustment). For the comparable two marker test (i.e., β 3 = 0), the null hypothesis was rejected when the empirical p-value was less than α = 0.05. Under each parameter combination, the cumulative assessment of statistical power was evaluated from the 1000 simulated data sets as the proportion of times the empirical (permutation) pvalues were less than the specified α level.

Drosophila Analysis
The population of Drosophila melanogaster used in our analysis was a set of 98 RILs (recombinant in lines) derived from a cross of two isogenic lines as described in Wayne et al. [8], for the trait ovariole number. There were 76 informative markers on 4 chromosomes. Markers used were the cytological map positions of the insertion sites of roo transposable element markers, with the exception of the fourth chromosome, where a visible mutation was used as a marker (spa) [12]. A complete linkage map was obtained for chromosome 1 (the X) and chromosome 3, with 15 adjacent marker pairs (16 markers) on 1 and 36 adjacent marker pairs (37 markers) on 3. There was a centromeric break in the genetic map for chromosome 2, such that there were 18 adjacent pairs (19 markers) on the left arm and 2 adjacent pairs (3 markers) on the right arm.
To compare the intersection test to the two marker test, the 71 pairs of markers identified above were examined. For each pair, the two marker regression with the test of the β 3 parameter was conducted at α = 0.05. The two individual markers were then separately modeled in a linear regression model (see Equation 1), and the intersection test was conducted. For the 71 unique pairs of markers, concordance between the intersection test and two marker test was estimated using the Kappa coefficient, and McNemar's test [14] was conducted to determine whether systematic differences existed between the two methods.