Fine-scale mapping of meiotic recombination in Asians
BMC Genetics volume 14, Article number: 19 (2013)
Meiotic recombination causes a shuffling of homologous chromosomes as they are passed from parents to children. Finding the genomic locations where these crossovers occur is important for genetic association studies, understanding population genetic variation, and predicting disease-causing structural rearrangements. There have been several reports that recombination hotspot usage differs between human populations. But while fine-scale genetic maps exist for European and African populations, none have been constructed for Asians.
Here we present the first Asian genetic map with resolution high enough to reveal hotspot usage. We constructed this map by applying a hidden Markov model to genotype data for over 500,000 single nucleotide polymorphism markers from Korean and Mongolian pedigrees which include 980 meioses. We identified 32,922 crossovers with a precision rate of 99%, 97% sensitivity, and a median resolution of 105,949 bp. For direct comparison of genetic maps between ethnic groups, we also constructed a map for CEPH families using identical methods. We found high levels of concordance with known hotspots, with approximately 72% of recombination occurring in these regions. We investigated the hypothesized contribution of recombination problems to age-related aneuploidy. Our large sample size allowed us to detect a weak but significant negative effect of maternal age on recombination rate.
We have constructed the first fine-scale Asian genetic map. This fills an important gap in the understanding of recombination pattern variation and will be a valuable resource for future research in population genetics. Our map will improve the accuracy of linkage studies and inform the design of genome-wide association studies in the Asian population.
Homologous recombination during meiosis results from the resolution of programmed DNA double-strand breaks by crossover between non-sister chromatids. Recombination is concentrated into hotspots roughly 2 kb in length scattered throughout the genome . Recombination rates between markers are used to construct genetic maps. These are important for linkage studies and also genome-wide association studies, since tagging single nucleotide polymorphisms (SNPs) depend on linkage-disequilibrium structures. Early human genetic maps were constructed using sparse microsatellite markers in Icelandic and CEPH (Center d’Etude du Polymorphisme Humain) families [2–4]. More recent studies have used SNP markers to reveal the locations of recombination with much greater resolution in Hutterites and other European-American cohorts (Framingham Heart Cohort Study and Autism Genetic Resource Exchange) [5, 6] and African-Americans . These studies used algorithms based on the ancestral origin of alleles , and parent-sibling tracing approaches [5, 6, 8]. In contrast to family methods which can directly observe recombination, coalescent methods use population genetic data to estimate the genetic map as a parameter of a probabilistic model. This captures the recombination that shuffled genomes through the generations leading to the current population. Applying these methods to HapMap data achieved very high resolution and allowed the identification of hotspots using likelihood ratio tests [1, 9].
For all applications, it is important for a genetic map to correspond well to the group to which it is applied. For example, the use of a map which accurately reflects the average recombination rate of a population will increase the power of gene identification studies. Some evidence suggests, however, that recombination patterns differ between populations. It is known that the DNA-binding protein PRDM9 plays a key role in specifying recombination sites . The binding motif that PRDM9 recognises must change rapidly under a constant Red Queen dynamic because of the ‘hotspot paradox’ [11–13]. Studies of human populations of African and European descent have shown variation in alleles of PRDM9 and local patterns of recombination [6, 7]. Three minor alleles, in addition to the more common A and B haplotypes, were previously identified in Han Chinese coding for either 12 or 13 zinc fingers . Interestingly, our preliminary genetic data from the Mongolian population suggest a larger diversity of PRDM9 alleles, including novel SNPs predicted to affect DNA binding (Additional file 1). Given this, the genetic map of Asians may be different to those from other populations. Furthermore, the hotspots found by coalescent models receding many generations into the past may not be in use today.
In a previous work, we constructed a map for Mongolians using 1,039 microsatellite markers . However, the resolution of the map was not sufficient to identify hotspot usage. There are currently no other Asian genetic maps available. Here we present a new genetic map, constructed using dense SNP markers in Mongolian and Korean pedigrees. The resolution of this map is now sufficient to reveal fine-scale patterns of recombination and Asian hotspot characteristics for the first time.
Results and discussion
We developed a novel algorithm to detect crossovers using family genotype data. This algorithm was based on a hidden Markov model as described in Methods. We tested the accuracy of our algorithm by applying it to simulated data with known crossover positions (see Additional file 1). This showed the sensitivity to be 97% and the precision rate to be 99%, which compares very well with previously published methods . We used our algorithm to investigate recombination in 169 Mongolian and Korean nuclear families (Table 1). By inputting genotype data for these families, we were able to deduce recombination events in a total of 980 meioses. We applied consistency checks to the raw output, and removed from analysis 18 meioses which failed our criteria. Our final data-set contained 32,922 crossovers with a median resolution of 105,949 bp (Figure 1). The complete list of these recombination events (Additional file 2) and sex-averaged genetic maps for each chromosome (Additional file 3) are available for download as a resource for future research. Our data is comprehensive, including maternal recombinations on the X chromosome that have previously been neglected [5, 6, 17]. These reveal an irregular pattern of peaks and troughs, with X chromosome recombination largely clustered near the telomeres, at an average distance of 28 Mb from the two ends. Thus, by detecting the locations of recombination in Mongolian and Korean pedigrees, we have constructed a high-quality, fine-scale Asian genetic map.
At the level of chromosomes, rate patterns are as expected with deserts at centromeres and spikes corresponding to increased paternal recombination near the end of chromosome arms as previously reported  (Figure 2). Asian males had an average of 26.3 (sample standard deviation = 3.3; 25.8-26.8 95% CI) and females had an average of 41.1 (s = 6.3; 40.1-42.0 95% CI) crossovers per meiosis (Table 2). These rates are comparable to previous studies of other populations [5, 6, 17]. In order to make a more direct comparison, we further applied our hidden Markov model to genotype data from CEPH families. A detailed comparison of the features of this map and the Mongolian and Korean maps is given in Table 3. Though we reported small but significant ethnic differences in genetic map length in our previous study , this direct comparison showed similar map lengths between Asians and CEPH families. We believe that the results of this study may be more accurate for several reasons as follows. First, this study made a direct comparison with identical software parameters and data handling. Second, cutting pedigrees into family quartets removes bias caused by different cohort family structures. For example, heuristic methods that consider all offspring together have been found to suffer downward bias in smaller families . Third, SNP markers are dense enough that interference rules out double crossovers. In contrast, the previous study relied on estimation of double crossover rates using the Kosambi map function .
We investigated the usage of the HapMap  set of hotspots by individuals. We call this the ‘historical hotspot’ concordance, because it represents divergence from recombination patterns established under a coalescent model. In addition to the hidden Markov model, our program also uses a deterministic scan to form a ‘prediction interval’ which defines the region in which a recombination event may have occurred. For this analysis, only 4,799 crossovers which could be pinpointed to within a prediction interval smaller than 30 kb were used, following the same criteria as previous studies . We found that 82% of such intervals overlapped with a historical hotspot. Naturally, some of these overlaps were due to weak resolution, rather than because the recombination event actually happened in a hotspot. Discounting for this possibility by calculating the rate of random overlap (see Additional file 1) we estimate that about 72% of Asian recombination occurs in a historical hotspot. We searched for evidence of novel Asian hotspots using the set of non-concordant well-resolved crossovers, but only observed one cluster of more than two such crossovers. This was located on chromosome 18 from NCBI build 36 physical position 2,871,839 to 2,883,413. We repeated the above analysis for the CEPH families dataset. We found that 82% of well-resolved prediction intervals overlapped a hotspot in both Asians and CEPH families, discounted to 72% when taking overlaps occurring by chance into account (Figure 3). These measurements are in line with results from coalescent methods in which hotspots cover 71-79% of the genetic map . Two family-based algorithms have previously been reported yielding lower hotspot overlap (72% for intervals resolved within 30 kb  and 74% for intervals resolved within 20 kb ) which we investigate further in Additional file 1.
Maternal age effects
It is known that reduced recombination on chromosome 21 is associated with Down syndrome . There is also a clear effect of maternal age on aneuploidy. It has thus been hypothesized that age-related aneuploidy could be caused by a reduction in recombination rates with maternal ageing. Early results, however, showed an opposite effect, with more crossovers detected in older mothers [5, 20]. It was suggested that this was due to selection favouring oocytes with more crossovers protecting them against other deterioration in the meiotic system . More recently, Hussin et al.  found a negative effect, and argued that more powerful effects on particular chromosomes (5 to 10, 15, 18 and 20) could reduce protection against non-disjunction. Our large sample size allowed us to address this problem by identifying the number of maternal recombinations in the meioses which gave rise to 338 children. This cohort was formed from all Asian families in our sample containing at least three children. Of these, 9 failed consistency checks and so were excluded from the analysis, and one was removed due to errors on the X chromosome. Recombination counts in these children were matched to their mother’s age at childbirth, rounded down to the nearest year. We performed regression analysis in R  with maternal ID included as a factor, so that we had a first-order model in maternal age at childbirth and maternal ID without interaction terms. We found a small unique effect of age on recombination rate of -0.29 recombinations for each year (p = 0.03; multiple R2 = 0.28; adjusted R2 = -0.003). At all ages, there is considerable variation in the recombination rate phenotype, and the linear model accounts for this variation poorly. We did not find a significant effect following a similar analysis for paternal age. We further tested for significant reductions on each chromosome between women younger and older than 30 (Figure 4). Chromosomes 12 and 17 showed a significant difference between the two age groups (p = 0.04 and p = 0.02; one-sided T-test), but this may be an artefact of multiple testing on 23 chromosomes. These chromosomes are also different to those identified as candidates previously . This makes it difficult to argue that effects on particular chromosomes are responsible for age-related aneuploidy.
Given that the effects of age on recombination are small, they may be susceptible to discrepancies in methodology. In the study by Kong et al. , ages were rounded up to the nearest 5 years and only 1000 microsatellite markers of unknown position were used. Interestingly, Kong et al. found a slight decrease in recombination rate specifically between the ages of 25 and 30, and a higher proportion of the mothers in the Asian cohort fall within this age range. To investigate this, we sampled Asian mothers with an age distribution corresponding to the previous study and binned individuals into 5-year windows. However, these changes only resulted in a reduced effect of -0.23 recombinations per year. Our findings therefore do not fit with either the positive effects reported previously, or the much stronger and chromosome-specific negative effects reported more recently. The frequency of aneuploidy increases powerfully and exponentially with maternal age, whereas our regression analysis found a poor coefficient of determination and weak effect. Our results therefore contradict the hypothesis of a very simple, direct relationship between reduced recombination and increased aneuploidy with maternal ageing, although there may be more subtle links between the two.
We have constructed the first high-resolution genetic map for Asians. Our map will be a useful resource for future population genetics research, improve the accuracy of linkage studies, help to predict disease-causing structural rearrangements and inform the design of genome-wide association studies in the Asian population. In general, Asian recombination patterns were similar to those in Europeans. A higher proportion of Asian and European recombination was mapped to hotspot regions than in previous studies. These results validate the application of combined genetic maps to Asians. Our data show that maternal age has a weak, negative effect on recombination rate, and there are no strong chromosome-specific effects.
Overview of methods
We used nuclear family genotype data to find Asian recombination patterns. By directly observing recent recombination events, this approach avoids the problem of coalescent methods finding recombination receding many generations into the past. Several deterministic programs using parent-sibling tracing methods have been described in the literature [5, 8, 16, 18], although few are available or easily applicable to new pedigree data. These methods have been shown to suffer from bias against calling recombination in smaller families . Furthermore, they have demonstrated fairly low accuracy where validation by simulation was performed . Therefore, we developed a new probabilistic method using a hidden Markov model. In general, our method was based on the idea that recombination events correspond to a parent switching between transmitting the same or different alleles to their children . We classified combinations of family genotypes as consistent or inconsistent with these two modes of transmission, and used that classification to define a hidden Markov model (Figure 5). Given actual observed genotypes for families, we found the most likely sequence of transmission states and deduced that recombination occurred where states changed. Our algorithm is available as Python code (Additional file 4) for application to new datasets. The program is accurate, portable, quick and easy to use, and can handle standard PLINK format data.
At the core of our algorithm was a hidden Markov model which could find recombination events in family quartets consisting of a father, mother and two children. The input for this model was genotype data for SNPs on a single chromosome, encoded at each position as the alleles A and B. For any genomic locus in a parent, two possible modes of transmission are possible: both children may receive the same allele, or each child may receive a different allele. These paternal and maternal transmission states combine to yield four possible states. Where the transmission state differs at adjacent genomic loci, we can infer that a crossover occurred between them during a paternal or maternal meiosis. We model the sequence of states along the markers on a chromosome as a Markov process. Observed genotypes at a given locus in all four family members may be consistent with a subset of these states. For example, if the father, mother and child genotypes are AA, AB, AA and AB respectively at some SNP, then we can infer that the mother is transmitting different alleles to the two children, but can make no inference regarding paternal transmission. Homozygous loci are uninformative because they are consistent with all four states. A hidden Markov model was constructed with a state space consisting of the four transmission states, and transition weights between these states corresponding roughly to previously reported genome-wide recombination rates . A hidden Markov model is an extension of a simple Markov chain, with observations also modelled with a probability distribution dependent on the state. This allows, for example, genotyping errors to be included in the model, rather than requiring post-hoc filtering. The set of emissions from each state was designed to encode all possible genotypes in all family members at a given locus, assuming a maximum of two alleles. The weights of emissions consistent with a transmission state were set to sum to 0.9995. Inconsistent emissions were allocated to the small remaining probability to allow for errors in DNA replication or genotyping. These small emission probabilities are balanced with the state transition probabilities to ensure that a few genotyping errors will not result in an erroneous double state transition. Family genotype data was fed into this model, and the Viterbi algorithm applied to find the most likely sequence of states. At each state change, an algorithm scanned forward and back to find the region of uncertainty, or ‘prediction interval’, where either neighbouring state was consistent with observations. For example, the Viterbi algorithm is able to determine a state change from both children inheriting the same allele from both parents, to the children inheriting different paternal alleles. However, the data will normally be ambiguous about the exact location of this change-point. To solve this problem, the set of possible states at each SNP in the region of the state change is recorded and a maximal interval formed by starting at the state change SNP and continuing in the 5′ and 3′ directions as far as more than one state remains possible. This non-probabilistic, strict delimitation of boundaries protects the algorithm from sensitivity to changes in state transition or emission probabilities. The SNPs marking the prediction interval were mapped to NCBI build 36 physical position. Alternative hidden Markov models were used to find maternal recombinations on the X chromosome (see Additional file 1). For families with an odd number of children, two overlapping quartets formed a quintet. Where a recombination prediction interval overlapped in the two quartets, we assume that the recombination occurred in the child present in both quartets and so only count it once. The collection of all prediction intervals was returned for further analysis.
Our work uses a novel algorithm, although similar approaches have been applied to phase next-generation sequencing data [22, 23]. We validated our algorithm by running it on simulated family data for which crossover locations were known. To simulate data as realistically as possible, we based our families on real haplotypes for chromosome 15 from the phased JPT+CHB HapMap dataset . Parents were constructed by selecting two haplotypes each at random out of the 340 available. We then simulated four meioses, selecting 3 crossover locations at random in each, to generate two children. We constructed 1,000 family quartets in this way, containing 12,000 recombination events. Running our hidden Markov model on this data yielded 11,643 prediction intervals that correctly localised 97% of the synthetic crossovers. Out of these 11,643 prediction intervals, only one was incorrect. This means that the precision rate of our algorithm is greater than 99.99%. To check the robustness of our approach on real data, we compared our results to crossovers inferred from sparse microsatellite markers. We tested results in 6 family quartets on chromosome 19. This chromosome had 32 microsatellite markers typed in those family quartets, as well as in their grandparents, allowing discovery of crossovers using identity-by-descent. Out of 33 crossovers detected by our algorithm, 32 were consistent with the microsatellite data, and one corresponded to a 6 Mb distant crossover.
Where families contained more than two children, the children were split into mutually overlapping groups of three. Each such triple was then split into three overlapping pairs, which were input in turn together with parent genotype data into the hidden Markov model. The recombination rates in each child were derived from the rates in the overlapping quartets containing that child. This allowed consistency checks to identify results which contradicted with those of overlapping quartets. Before constructing our genetic map, we checked for such inconsistent triples, and filtered out all of the meioses implicated in these errors. There were four (partially overlapping) triples in three families where checks found mathematical inconsistency. We were able to rescue data for three children in one such family where they formed a consistent trio. This left 9 children in three families where recombinations could not be distinguished consistently. These were removed from the analysis.
This study used genotype data from Mongolian pedigrees, generated by the Illumina Human 610-Quad Beadchip yielding 569,132 markers. The data was originally collected for association studies in the GENDISCAN project. This was approved by the Institutional Review Boards of Seoul National University, approval number H-0307-105-002. Mendelian errors were filtered using PedCheck and double recombination errors were filtered using Merlin. Markers with a low call rate (<99%), high error rate (>1%) or low minor allele frequency (<0.01) were excluded as described previously . We also used genotype data from the Healthy Twin Study, a project that genotyped adult twins and other family members willing to participate through two Korean hospitals. The Affymetrix Genome-Wide Human SNP Array 6.0 was used to type 906,600 SNPs overall, which was reduced to 516,452 after cleaning Mendelian and non-Mendelian errors. The criteria for exclusion were: Mendelian error in more than 3 families, minor allele frequency below 0.01, Hardy-Weinberg equilibrium test <0.001, missing genotype data >0.05 or non-Mendelian error in more than 3 families. The study protocol was approved by the ethics committees at Samsung Medical Center and Busan Paik Hospital. Results were compared to those obtained by applying the same method to CEPH pedigree data. This data was generated under the study entitled ‘Genotyping NIGMS CEPH Samples from the United States, Venezuela and France’, and provided to us under a dbGaP access request. There were 931,633 SNP markers in this dataset. From the complete pedigrees, families of at least four members were selected using a Python script. In all, 182 families were gathered in this way. The set of 32,991 (34,136 including the X chromosome) hotspots from HapMap Phase II  was used for historical hotspot concordance analysis after lifting to NCBI build 36. The software utility liftOver from UCSC (http://genome.ucsc.edu/cgi-bin/hgLiftOver) was used to perform this transfer, during which 6 hotspots were partially deleted.
Myers S, Freeman C, Auton A, Donnelly P, McVean G: A common sequence motif associated with recombination hot spots and genome instability in humans. Nat Genet. 2008, 40: 1124-1129. 10.1038/ng.213.
Matise TC, Chen F, Chen W, De La Vega FM, Hansen M, He C, Hyland FC, Kennedy GC, Kong X, Murray SS, Ziegle JS, Stewart WC, Buyske S: A second-generation combined linkage physical map of the human genome. Genome Res. 2007, 17: 1783-1786. 10.1101/gr.7156307.
Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K: A high-resolution recombination map of the human genome. Nat Genet. 2002, 31: 241-247.
Dausset J, Cann H, Cohen D, Lathrop M, Lalouel JM, White R: Centre d’etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics. 1990, 6: 575-577. 10.1016/0888-7543(90)90491-C.
Coop G, Wen X, Ober C, Pritchard JK, Przeworski M: High-resolution mapping of crossovers reveals extensive variation in fine-scale recombination patterns among humans. Science (New York, N.Y.). 2008, 319: 1395-1398. 10.1126/science.1151851.
Fledel-Alon A, Leffler EMM, Guan Y, Stephens M, Coop G, Przeworski M: Variation in human recombination rates and its genetic determinants. PLoS One. 2011, 6: e20321+-
Hinch AG, Tandon A, Patterson N, Song Y, Rohland N, Palmer CD, Chen GK, Wang K, Buxbaum SG, Akylbekova EL, Aldrich MC, Ambrosone CB, Amos C, Bandera EV, Berndt SI, Bernstein L, Blot WJ, Bock CH, Boerwinkle E, Cai Q, Caporaso N, Casey G, Cupples LA, Deming SL, Diver WR, Divers J, Fornage M, Gillanders EM, Glessner J, Harris CC, Hu JJ, Ingles SA, Isaacs W, John EM, Kao LH, Keating B, Kittles RA, Kolonel LN, Larkin E, Le Marchand L, McNeill LH, Millikan RC, Murphy A, Musani S, Neslund-Dudas C, Nyante S, Papanicolaou GJ, Press MF, Psaty BM, Reiner AP, Rich SS, Rodriguez-Gil JL, Rotter JI, Rybicki BA, Schwartz AG, Signorello LB, Spitz M, Strom SS, Thun MJ, Tucker MA, Wang Z, Wiencke JK, Witte JS, Wrensch M, Wu X, Yamamura Y, Zanetti KA, Zheng W, Ziegler RG, Zhu X, Redline S, Hirschhorn JN, Henderson BE, Taylor HA, Price AL, Hakonarson H, Chanock SJ, Haiman CA, Wilson JG, Reich D, Myers SR: The landscape of recombination in African Americans. Nature. 2011, 476: 170-175. 10.1038/nature10336.
Lee Y-S, Chao A, Chen C-H, Chou T, Wang S-YM, Wang T-H: Analysis of human meiotic recombination events with a parent-sibling tracing approach. BMC Genomics. 2011, 12: 434-10.1186/1471-2164-12-434.
Frazer KA, Ballinger DG, International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007, 449: 851-861. 10.1038/nature06258.
Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, Przeworski M, Coop G, De Massy B: PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science (New York, N.Y.). 2010, 327: 836-840. 10.1126/science.1183439.
Jeffreys AJ, Neumann R: Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet. 2002, 31: 267-271. 10.1038/ng910.
Myers S, Bowden R, Tumian A, Bontrop RE, Freeman C, MacFie TS, McVean G, Donnelly P: Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science (New York, N.Y.). 2010, 327: 876-879. 10.1126/science.1182363.
Ubeda F, Wilkins JF: The Red Queen theory of recombination hotspots. J Evol Biol. 2011, 24: 541-553. 10.1111/j.1420-9101.2010.02187.x.
Parvanov ED, Petkov PM, Paigen K: Prdm9 controls activation of mammalian recombination hotspots. Science (New York, N.Y.). 2010, 327: 835-10.1126/science.1181495.
Ju YS, Park H, Lee M-KK, Kim J-II, Sung J, Cho S-II, Seo J-SS: A genome-wide Asian genetic map and ethnic comparison: the GENDISCAN study. BMC Genomics. 2008, 9: 554+-10.1186/1471-2164-9-554.
Ting JC, Roberson ED, Currier DG, Pevsner J: Locations and patterns of meiotic recombination in two-generation pedigrees. BMC Med Genet. 2009, 10: 93+-10.1186/1471-2350-10-93.
Hussin J, Roy-Gagnon M-HH, Gendron R, Andelfinger G, Awadalla P: Age-dependent recombination rates in human pedigrees. PLoS Genet. 2011, 7: e1002251+-
Khil PP, Camerini-Otero RD: Genetic crossovers are predicted accurately by the computed human recombination map. PLoS Genet. 2010, 6: e1000831+-
Warren AC, Chakravarti A, Wong C, Slaugenhaupt SA, Halloran SL, Watkins PC, Metaxotou C, Antonarakis SE: Evidence for reduced recombination on the nondisjoined chromosomes 21 in Down syndrome. Science (New York, N.Y.). 1987, 237: 652-654. 10.1126/science.2955519.
Kong A, Barnard J, Gudbjartsson DF, Thorleifsson G, Jonsdottir G, Sigurdardottir S, Richardsson B, Jonsdottir J, Thorgeirsson T, Frigge ML, Lamb NE, Sherman S, Gulcher JR, Stefansson K: Recombination rate and reproductive success in humans. Nat Genet. 2004, 36: 1203-1206. 10.1038/ng1445.
R Development Core Team: R: A language and environment for statistical computing. 2008, Vienna, Austria: R Foundation for Statistical Computing, http://www.R-project.org, 3-900051-07-0,
Roach JC, Glusman G, Hubley R, Montsaroff SZ, Holloway AK, Mauldin DE, Srivastava D, Garg V, Pollard KS, Galas DJ, Hood L, Smit AF: Chromosomal haplotypes by genetic phasing of human families. Am J Hum Genet. 2011, 89: 382-397. 10.1016/j.ajhg.2011.07.023.
Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, Shendure J, Drmanac R, Jorde LB, Hood L, Galas DJ: Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science (New York, N.Y.). 2010, 328: 636-639. 10.1126/science.1186802.
Paik SHH, Kim H-JJ, Son H-YY, Lee S, Im S-WW JYSS, Yeon JHH, Jo SJJ, Eun HCC, Seo J-SS, Kwon OSS, Kim J-II: Gene mapping study for constitutive skin color in an isolated Mongolian population. Exp Mol Med. 2012, 44: 241-249. 10.3858/emm.2012.44.3.020.
We thank the subjects who provided their DNA and family information in all three cohorts. We thank the researchers who performed the study ‘Genotyping NIGMS CEPH Samples from the United States, Venezuela and France’, and generously made their data available. Genotyping data for the CEPH Family Panels was obtained from the NIGMS Human Genetic Cell Repository (http://ccr.coriell.org/Sections/Collections/NIGMS/?SsId=8) through dbGaP accession number phs000268.v1.p1. We thank researchers in the GENDISCAN project and the Healthy Twin Study whose efforts provided us with valuable, high-quality genotype data with which to work. The GENDISCAN project was supported by the Korean Ministry of Education, Science and Technology (Grant Number 2003-2001558).
The authors declare that they have no competing interests.
JSS conceived of this study. JSS and JHS managed the GENDISCAN project and the Healthy Twin Study respectively. TB constructed the hidden Markov model, analysed the data and drafted the manuscript. YSJ helped the analysis. All authors participated in manuscript writing. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Supplementary Notes. Details on X chromosome hidden Markov models, concordance analysis, alternative simulations, algorithm comparison and Mongolian exome sequencing results. (PDF 264 KB)
Additional file 2: List of Crossover Prediction Intervals. The complete unordered list of recombination events detected in Mongolian and Korean families. The first column is the sex of the individual in which recombination was observed, the second column is the chromosome on which it was located, and the third and fourth mark the physical position of the start and end point of the range in which the algorithm determined the recombination could have occurred. Positions are given in terms of NCBI build 36. (CSV 870 KB)
Additional file 3: Genetic Maps. Archive containing combined genetic maps for all chromosomes. The SNP list text files contain the rs numbers of all of the SNPs shared by the Mongolian and Korean cohorts, ordered by physical position. The map files give the genetic distance between each of these markers in order in centimorgans. All observed Asian recombination events are mixed to form this map. (ZIP 1 MB)
Additional file 4: Quartet Analysis Algorithm. Archive containing a script to be opened using Python (http:////python.org). The Python source code for the algorithm used in this study (script.py) is available to run on new datasets. Given the identities of the father, mother and two children in a family quartet, and genotype data in PLINK format, it returns a list of recombination prediction intervals. For instructions and input file requirements, please see the file README.txt. This package also includes example input files (*.syn) generated during simulations. (ZIP 122 KB)