Skip to main content
Figure 4 | BMC Genetics

Figure 4

From: Clustering by genetic ancestry using genome-wide SNP data

Figure 4

Accuracy, Stability, Between Cluster Distance and Scoring Index for NECS. Accuracy, stability and between cluster distance on the k-means cluster assignments are displayed in red and the measures on the permuted cluster assignments are displayed in blue. The scoring index maximizes at 11 clusters (SI = 0.886, 95% CI: (0.876, 0.896)). The mean score for 9 clusters (0.879) falls within the 95% confidence interval of 11 clusters and thus is as good as 11 clusters but is more parsimonious. and is also the optimal solution. In general, we expect the permuted stability to increase with an increasing number of clusters because a larger number of pairs will be consistently assigned to different clusters since there are more available clusters. However, note the peculiar trend of cluster stability in this example: the permuted stability decreases when moving from k = 2 to 4 clusters and then gradually increases. This occurs because one cluster contains the majority of the subjects for these clusters sizes and thus the number of pairs of subjects assigned to the same cluster, by chance, is much higher than in the case where there are equal sample sizes in each cluster. Thus it is important to account for the permuted measurements when selecting the optimal number of clusters.

Back to article page