Initially the analyses were performed for each replicate separately, but this did not succeed in localizing any of the known genetic effects to their correct locations. The five replicates were therefore pooled and analyzed together to see if the larger ensuing sample size (9230 sib pairs) would help in the detection of the trait loci. Figure 1 shows the results for the baseline (α) and slope (γ measures using the unified Haseman-Elston method [3]: similar results were produced by the other methods (data not shown). Even with such a large sample size, there is little evidence for or localization of any of the known trait loci: the results on chromosomes 1, 3, 9, and 21 are essentially indistinguishable from those on chromosome 2 on which it is known that no trait loci reside.
These disappointing results could be due to the relatively small proportion of the overall variance that is accounted for by each trait locus, and/or to the fact that our phenotype measures α and γ are very poor measures of the true baseline and slope effects due to the difference between the generating model (equation (2)) and that assumed (equation (1)) when deriving α and γ. To assess the performance of our procedure in a situation where the true generating model does in fact take the form of equation (1), we used the GAW13 simulated data provided as the basis of a new simulation. We simulated data in which the genetic contribution to baseline phenotype was determined solely by genotype at marker locus 2 on chromosome 21: if an individual had any copies of alleles 1–3 at this locus, the mean baseline phenotype was set to 5, otherwise it was set to 3. Similarly, the genetic contribution to slope phenotype was determined solely by genotype at marker locus 5 on chromosome 21: any copies of alleles 1–3 at this locus caused the slope value to have mean 0.002, otherwise mean 0.001. The final simulated glucose phenotype for individual i at time t was determined as
glucose = baseline + ε1 + weight/150 + (slope + ε2)(age - 20) + ε3,
where at each i and t, ε1, ε2 and ε3, were sampled from normal distributions with mean 0 and standard deviations 0.4, 0.0001, and 0.0003, respectively. These data were analysed in the same way as the original data, and the results using replicate 1 only (1899 sib pairs) are shown in rows 1 and 2 of Figure 2. All four of the regression procedures succeeded in detecting and accurately locating both the locus controlling the baseline effect (which should be located at 13.84 cM) and the locus controlling slope effect (which should be located at 43.89 cM). It is interesting that the IBD regression method [4] gives higher significance than the other methods when detecting the baseline effect: further investigation of the properties of this method through theoretical calculations and simulation will be required to determine the cause of this elevated significance.
The success here could be due to the fact that the effects we simulated were relatively extreme, or to the fact that the generating model (equation (1)) corresponded to that assumed in the analysis. We therefore repeated the simulation using identical baseline and slope effects, but with the final glucose phenotype generating model altered to
(i.e., taking the same form as equation (2)) with ε1, ε2, and ε3 as described previously. The results are shown in rows 3 and 4 of Figure 2. Again we find that the loci controlling the baseline and slope effects are both accurately detected. It would therefore appear that our procedure works well even when the true biological model does not exactly correspond to that assumed in the derivation of the trait measures α and γ. The poor performance in the GAW13 simulated data is therefore likely to be due to the fact that in the GAW13 data, there were very many contributing genetic and environmental factors, resulting in a much smaller relative contribution for any given locus to the overall variation in fasting glucose. Unfortunately, for many complex traits the true situation is likely to be closer to the GAW13 simulation than to our simulation, indicating that often there may be very low power to detect effects of this magnitude, even with very large sample sizes.