Pedigree error detection and correction
The program PREST [4] was used to detect pedigree errors. PREST estimates the probabilities, p0, p1, and p2 of two individuals sharing 0, 1, and 2 alleles identically by descent (IBD), respectively. We calculated this over all of the relationship pairs known to PREST (parent-offspring, full-sibs, half-sibs, avuncular, first-cousins, grandparent-grandchild, half-avuncular, half-first cousin, half-sib plus first-cousin, monozygotic twins, and unrelated) within and between pedigrees. Pedigree errors were first screened with PREST's analytical tests: conditional estimated identity by descent (EIBD), adjusted identity by state (AIBS), and IBS, in that order and where applicable, at α = 0.0001, to focus on the more significant problems. This index pair and their relatives were then examined more thoroughly using PREST's accompanying program ALTERTEST that can test two individuals for each of the 11 relationship classes.
PREST comes with an R script written by Dan Weeks to plot the IBD estimation of a single relative pair on a relationship triangle [5]. We modified this program to provide a scatter diagram of IBDs on the triangle. The result is an informative graphical summary of the pedigree errors in the sample. Pedigrees were drawn with PEDIGREE/DRAW [6].
Genotyping error detection and correction
Genotyping errors are detected using SIMWALK [7, 8]. SIMWALK2 applies a Markov-chain Monte Carlo method to data from the pedigree, population allele frequencies, and a genetic map to assign probabilities of mistyping for each genotype. Because this is a computationally intensive exercise, we examined genotyping errors only on chromosome 7.
We ran SIMWALK2 in two phases. In the first phase, Mendelian errors were detected and corrected independently for each marker. Marker genotypes were blanked (changed to a missing value) for all probabilities of mistyping above a given threshold. The threshold was chosen conservatively, i.e., to blank no more genotypes for a marker than necessary to calculate a likelihood for that marker. The mistyping probability was decremented from 1.0 until a calculable likelihood was reached.
In the second phase, genotypes that suggest improbable double recombination events were blanked. Mistyping probabilities were assigned using the genetic maps supplied with the GAW13 data. In this phase, the proportion of genotypes potentially blanked at a series of thresholds is plotted to provide a visual guide for choosing a threshold.
Following the corrections of the pedigree and genotype data, we recalculated for comparison, the chromosome 7 genetic map using MULTIMAP/CRIMAP [9, 10].