Overall, PEDCHECK found 1104 Mendelian errors or 40% (range 38-42%) of the generated errors in the four data sets containing errors. In all but one case of 57 level 4 errors, the true misgenotyped person was listed as a possible error. Thirty-eight of these individuals were indicated as being the most likely misgenotyped person when several closely related individuals were suggested. In two cases, the individuals for whom errors were simulated were not listed in the level 4 output but others in their nuclear family were listed. Except for these latter two cases, the simulated error was untyped (even when PEDCHECK didn't indicate it was the most probable).
When no errors were present, SIMWALK2 reported 9, 58, 124, 190, and 238 errors (p > 0.25) and 6, 2, 7, 7, and 5 errors (p > 0.95) using, respectively, AER values of 0, 0.25, 0.50, 0.75, and 0.1. Figure 1 plots "true-positive" rates and indicates that both the AER and the GER affect the probability that a purported error (p > 0.25) is a true (i.e., generated) error. The "false-positive" rate equals 1 minus the plotted value for any GER-AER combination. Assuming no errors (ae000) in the data sets containing errors, ratios of errors found/true errors were 7/10 (70%), 6/11 (54%), 19/26 (73%), and 12/18 (67%), respectively. At ae100, these ratios dropped to 88/337 (61%), 155/413 (37%), 260/524 (50%), and 287/604 (48%). Other than for 0.0 error rates, when the AER was equal to the GER about 50% of the purported errors were true (generated) errors.
Expanding the ae050 results from Figure 1, Figure 2 reports the effect of increasing stringency in accepting "true-positives". Only ge100 shows a constant decline in the cumulative "true-positive" rate as more errors are accepted, from p = 1 (100%) to p > 0.25 (60%) whereas the others show at least some increase in their overall decline. Generally, however, the overall decline in the "true-positive" rate is less severe for higher GER. As the AER increases, the entire plot in Figure 2 shifts downward (data not shown).
Figure 3 examines effects of marker heterozygosity, het = 0.607 vs. 0.899, on the probability that a purported error (p > 0.25) was a true error that was generated under low (0.025) and high (0.100) GER. The role of GER appears to have profound effects on this probability, while marker heterozygosity does not.
As a measure of success in identifying generated errors in terms of all errors present in the data set, Figure 4 indicates that increasing the AER results in an increased overall error detection rate and that the GER has less impact than AER on the overall error detection probability. It is unclear whether the lower curve for ge100 is real or due to sampling error. Over this AER range, the maximum error detection frequency was 50% (at ae100). At this highest overall rate of assumed errors, 52 % (88/170), 47% (155/328), 51% (260/509), and 43% (287/668) of the generated errors were correctly identified. However, because the relationship between AER and detection was nearly linear for AER > 0 this frequency may increase with AERs > 0.100.
Since the proportion of individuals typed in a family is also potentially a factor in identifying errors, Figure 5 presents the frequency of errors detected (p > 0.25) in each of six family groups defined by typing frequency. In these six groups, 35, 226, 638, 509, 230, and 63 genotyped individuals were respectively "at-risk" for a genotype error. The probability of reporting an error is generally higher in families that have a lower proportion of members genotyped. However, this trend does not appear to apply for the set of families (n = 9) with the highest frequency of genotyping. The small number of families and/or genotyped individuals (n = 63) included in this group may partially explain this apparently anomalous result.