The number of random SNP loci needed to correctly classify individuals in the HapMap data. Boxplots show the statistics of predicted origin vs. known origin for CEU, YRI and CHB+JPT (CVJ) estimated with different numbers of SNP loci. Each dendrogram tree was cut at depth 2 to generate three clusters and predicted origin was assigned by the major population group represented in the cluster. Each number of SNPs was randomly sampled 100 times from 22 autosomal chromosomes. Horizontal lines are drawn at the 1st quartile, 3rd quartile and median and are connected to form the box. A vertical dashed line is drawn down from the 1st quartile to the most extreme data point within a distance of 1.5 interquartile range (IQR). A similar line is drawn up from the 3rd quartile. The ends of the vertical lines are indicated by short horizontal lines. Outliers are marked by dots. Red diamonds are the means of the classification error rate for the pooled whole sample for each number of SNP loci tested and red arrows are mean ± standard deviation.