Plots of the gap statistic. The correct number of populations, K, was estimated via the gap statistic. In the left panel, the blue and red curves are the estimated expectation of log (W
) and the observed log (W
), respectively. The right panel is the gap statistic plot. The number of populations is set to range from 1 to 6. (a) and (b) correspond to the HapMap data, using 1,000 random genome-wide SNP loci. (c) and (d) correspond to the CHB and JPT data, using 30,000 random genome-wide SNP loci. (e) and (f) correspond to the Perlegen data, using 1,000 random genome-wide SNP loci. The inferred optimal K is the elbow point in the left panel, which is indicated by the maximizing gap on the right panel. It is clear that the gap statistic gives the optimal number of populations in each scenario as 3, 2, and 3, respectively.