Estimating the age of founder mutations will always be an inexact endeavour. The true recombination and mutation history of the relevant chromosomal segments is unknown, and all models make strong assumptions that cannot be verified. Nevertheless, multi-marker approaches tend to give more consistent estimates than single marker estimates. For the latter, excellent recombination fraction estimates are crucial, and hence, estimates are particularly unreliable for markers very close to the mutation where the recombination fraction cannot be well estimated. Since the single-marker estimates are based simply on the expected decay of one haplotype, no estimates can be obtained when the allele frequency on control chromosomes exceeds the allele frequency on the mutation-carrying chromosomes.
To implement the likelihood and the single-marker methods, it is necessary to know which allele lies on the ancestral haplotype containing the mutation (the mutation-associated allele at each marker). To obtain this information, we estimated haplotypes using PHASE [18–20] and used haplotype predictions to estimate which were the associated alleles at each marker. For the single marker method, we used the most probable mutation-associated allele, but for our modification of Goldgar's likelihood method, we included all predicted mutation-carrying haplotypes, with their probabilities.
The multi-marker likelihood for one haplotype is calculated by proceeding outwards from the known disease locus, while calculating the probabilities needed to explain the data at each new marker, assuming either recombination or mutation events as required; these are then combined in a weighted sum over the possible haplotypes. Although the result can be expected to underestimate the true mutation ages since population growth is not taken into account, we corrected the estimates for growth using the median of the single-marker Labuda growth rate corrections. Although this method incorporates parameters for mutation at each marker, unlike the DMLE model, over the time frames estimated for these three data sets, the probability of marker mutation or disease locus mutation is small. For older events and tightly linked markers, however, this method should have an advantage over methods based solely on recombination.
The DMLE method estimates time to the origin of the mutation rather than to the most recent common ancestor. Furthermore, this method includes consideration of the possible variability in genealogies when calculating the posterior credible intervals. These two factors would lead one to expect wider confidence intervals and older age estimates.
Based on the DMLE estimate, the MSH2 mutation is estimated to be approximately 17 generations old, or perhaps 425 years old assuming 25 years per generation, and the likelihood-based estimate is very similar. These estimates place the origin of the mutation about in the year 1575, during a period when the community was reasonably large and undergoing fast population growth. It is very plausible that any new mutation would become rapidly and widely disseminated under these conditions .
In contrast, we see substantial disparities between the two multi-marker methods for the other two mutations. The I1307K mutation is estimated by DMLE to have arisen in perhaps 650 BCE (assuming r = 1.125), yet at 350 CE by the likelihood approach. Despite the large difference, these origin estimates are consistent with the fact that the mutation was seen in several Jewish populations that are not Ashkenazi, but share a common haplotype around I1307K . In , the BRCA2 6174delT mutation was estimated to have arisen 29 generations ago, using the original version of the likelihood method. Here, our estimates, using the modified likelihood method on new marker data and without correction for population growth, were quite a bit smaller (9-17 generations), and the DMLE estimates (including corrections for growth) were substantially larger at 60 or 90 generations. In this case, the choice of method, and to a lesser extent, the choice of markers, makes a substantial difference to the result. It is worth noting, however, that the confidence intervals for the likelihood based approach are extremely wide and include the DMLE estimates and most of their credible intervals. Using DMLE, the 6174delT mutation may have arisen at approximately 200 BCE (r = 1.125) but the most recent common ancestor is estimated at 1000 CE by the Goldgar likelihood (allowing for population growth). The earlier estimate agrees perhaps better with the rare finding of the BRCA2 6174delT mutation in Sephardi Jews . How can these differences be reconciled?
Since the DMLE method attempts to identify the time in history at which the mutation in question arose, rather than the most recent common ancestor, it is possible, therefore, that if a mutation were to arise in a relatively isolated Jewish community, it could be transmitted within that community for a substantial number of generations before branching out into ostensibly unrelated Jewish individuals. In this situation, the most recent common ancestor could be considerably closer to the current time than is the origin of the mutation. This phenomenon, while entirely speculative, could explain the data obtained here, wherein DMLE estimates for both the APC and BRCA2 mutation place the origin of these mutations considerably further back in time than do the other methods. While it seems unlikely that the origin and wide dissemination of a mutation could be separated by more than 1000 years, we do not know in what place or Jewish sub-group this mutation arose, and prolonged geographical or cultural isolation could potentially lead to such an effect.
Also, the DMLE results change quite a lot when the growth rate is altered; it appears that a slower growth rate leads to much more potential variability in genealogies in this data set. We used simple estimates of recombination rates based on physical distance, and hence results based on the 6 closest markers may be less accurate than those based on markers spanning a longer distance.
The accuracy of any method will be only as good as the accuracy of the required parameters. Over short time scales (in evolutionary terms) such as those observed in all three of our data sets, mutation rates will be hard to estimate, and over short physical distances, recombination rates will be imprecise. Assuming that the chromosomal region of interest is well-behaved, the optimal set of markers to use when estimating age would include markers far enough away that the recombination fraction can be adequately estimated, yet not so far that the associated haplotype is no longer identifiable. Furthermore, the growth rate of the population has a very large impact on the estimated age of the mutations. Over these time periods and for the Jewish population of these three data sets, some estimates of population size are available. However, the assumption of a constant growth rate over time, which is used in both the Labuda correction and in DMLE, is known to be untrue. The Ashkenazi population experienced distinct periods of fast population growth and subsequent population bottlenecks , and these events are not taken into account in the estimation of mutation age. Nevertheless, since both DMLE and the Labuda correction assume constant growth rates, our comparison of methods is still interpretable.
Some other approaches for estimating mutation age have been proposed. One method is based primarily on the rate of occurrence of mutations over time, near the locus of interest . This approach, which is implemented in the software BATWING, assumes no recombination, and hence is ideally suited to very small regions of the genome, or to mitochondrial DNA where there is no recombination. Since mutations occur quite rarely, this approach is particularly suited to estimating events that occurred a long time ago. A more recent approach  examines haplotype sharing between individuals, and constructs a phylogenetic tree from the similarity matrix. This approach is less dependent on linkage disequilibrium, but does not take the possible variability in the genealogies into account.