Paternal genetic portraits of south Kazakh clans
We genotyped 35 Y-SNP and 17 Y-STRs markers in 490 individuals, representing 11 clans of South Kazakhstan – eight clans of Uissun tribe and three clans which belong to Senior Zhuz but are not considered as members of the Uissun tribe. Twenty-seven Y-chromosomal haplogroups were identified in this sample (Fig. 2, Additional file 2).
More than half of the Y-chromosomal gene pool (51%) in South Kazakhs (Senior Zhuz) is characterized by three clades of haplogroup C2 (Fig. 2). One-third (34%) of the gene pool is shaped by the haplogroups J-M172 (13%), N1a1a-M178 (7%), Q-M242 (7%), and R1a1a-M198 (7%). Though haplogroups are not frequent in South Kazakhstan in general, they become predominant within the specific clans (Fig. 2).
For most clans of South Kazakhstan, the haplogroup C2-M217 (xM48, M407) is a major one. Among the eight clans of the Uissun tribe, C2-M217 is the most frequent for six of them (Dulat, Alban, Suan, Shaprashty, Oshakty, and Sary-Uissun). It is also the most frequent for the non-Uissun clan of Jalair. For the two remaining Uissun clans, the major haplogroups are N1a1a-M178 (80% of the Syrgeli clan) and J1*-M267(xP58) (74% of the Yssty clan). For the two remaining non-Uissun clans, the major haplogroups are Q-M242 (53% of the Kanly clan) and G2-P15 (35% of the Shanyshkly clan).
Though sample sizes for the four clans are small (N < 20) the samples for the remaining clans are representative (N = 63 on average); therefore, there is strong evidence for the prevalence of haplogroup C2-M217(xM48, M407) among Kazakhs of the Senior Zhuz. It was reported, that C2* is the major haplogroup (88%) in the Sary-Uissun clan [9], and the same haplogroup is major for the Alban clan of the Xinjiang Uyghur District of China (44%) [10]. Interestingly, the evidence that the Shanyshkly clan are contradictive: according to citizen science projects, a high frequency of the haplogroup C2* (Additional file 3) was detected; data from the present study reports only 18% of C2*, while the most common haplogroup is G2-P15 (36%).
Genetic structuring of the paternal gene pool of Uissun
The genetic portraits of the three Uissun clans (Yssty, Syrgeli, and Oshakty) are very specific (Fig. 2) in contrast to the genealogy, suggesting the origin of all nine Uissun clans (Additional file 1) from a common ancestor named Maiky-biy. The genetically distinct origin is also confirmed by AMOVA analysis. In this analysis, we either considered seven Uissun clans as independent branches or left separate only three genetically specific clans grouping the remained four into the “core Uissun” population. This structure of the Uissun clans, consisting of four groups, turned out to be more efficient (FST = 0.35) than without substructuring into groups (FST = 0.24) (Additional file 4). A similar result was observed using multidimensional methods of statistical analysis (MDS and PCA). A single cluster for seven clans (Sary-Uissun, Dulat, Alban, Suan, Shaprashty, Oshakty and Jalair) is highlighted on the MDS chart (Additional file 5). On the PCA plot, five clans (Sary-Uissun, Dulat, Alban, Suan, and Shaprashty) form a cluster along the first component axis, which is approached by the Oshakty clan. The Jalair clan moves away along the second component axis (Additional file 6).
Phylogenetic analysis of haplogroup С2*-F3796
We performed the detailed phylogenetic analysis of the most frequent haplogroup among the Uissuns – C2*-ST (40%). This haplogroup, also known as Star Cluster (ST), is clearly distinguished within M217(xM48, M407) by STR haplotypes. It corresponds to the subclade marked by the SNP F3796 [10]. This lineage had spread rapidly over the steppe in Eurasia during the conquests of the Mongol Empire. It has been presumably associated with the haplotype of Genghis Khan or his relatives [11]. The highest frequencies of the C2*-ST were found in Kazakhs from the Kerey clan of the Middle Zhuz (77%) [12], Buryats from the Bargut clan (46%) [10], Hazaras (38%) [13]; Uzbeks from Afghanistan (35%) [14], and Mongols (35%) [15]. The highest haplotype diversity in C2*-ST is specific for the Mongols (HD = 0.91) and Uzbeks (HD = 0.95), and the lowest diversity was found for the Kazakhs, both for the Uissuns (HD = 0.86) and other tribal groups (HD = 0.84).
The phylogenetic network of haplotypes within the haplogroup C2*-ST (C-F3796) was constructed using 15 STR loci of the Y chromosome according to data on 743 individuals from 25 populations of Eurasia (novel samples, N = 194, Additional file 2; previously published, N = 549, Additional file 7). Additional file 8 presents the haplotypes of the Eurasian ethnic groups, while Fig. 2 highlights the haplotypes of the Kazakh clans on the same network (other ethnic groups are shown in yellow). Five distinct clusters are clearly visible on the network (Fig. 3). Majority of the samples from South Kazakhstan (the Senior Zhuz) were included into the two newly identified C2*-ST (Additional file 9) subclusters: α-cluster (N = 44) and γ-cluster (N = 122), and only a few samples (N = 9) entered the β-cluster (dating 701 CI 95% 909–493 years), previously identified for the Kerey clan of the Middle Zhuz. The α-cluster (dated 746 CI 95% 1104–388 years) was mainly composed of the the (non-Uissun) Jalair clan members. The γ-cluster (dating 742 CI 95% 916–568 years), which we called “Uissun” cluster, mainly included representatives of the four Uissun clans, namely, Dulat, Alban, Suan, and Sary-Uissun (Fig. 3). Among the two Hazara subclusters, the δ-cluster originates from the Uissun γ-cluster, while the ε-cluster is derived from the common C2*ST founder and includes, in addition to the Hazaras, a few Uzbek haplotypes.
Comparison TMRCA by SNP and STR of Y-chromosome
The TMRCA (time to the most recent common ancestor) of the C2*-ST cluster (Fig. 3) based on Y-STR profiles is estimated to be 1544 CI 95% 1968–1120 years. In contrast, TMRCA based on SNPs from sequencing data of 17 Y-chromosome samples [10] is ~ 2600 years ago. This dating coincides with citizen science data from 15 Y-chromosome sequences (the TMRCA of the C2*-ST cluster based on SNPs is ~ 2500 years (TMRCA CI 95% 3200–1850 ypb) (www.yfull.com)). What is the reason for a significant discrepancy between 1500 years estimated by STRs and 2500 years resulted from SNP data? It may be explained by the incomplete mapping of STR clusters and the topology of the phylogenetic tree, since different SNP subclades may have the same STR haplotypes, as shown in Additional file 9. For α and β clusters, which included samples from South Kazakhstan, we also compared the dates obtained by SNP and STR data. The α-cluster TMRCA based on SNPs (four Y chromosomes sequenced) turned out to be ~ 750 years (TMRCA CI 95% 1050–400 ybp) (subclade Y12782, a sample from the Dulat clan [16], www.yfull.com), which is very close to STR dating (746 CI 95% 1104–388 years). The TMRCA of the β-cluster based on SNPs from three sequenced Y chromosomes (~ 650 years ago (TMRCA CI 95% 346–982 ybp), C2-F8949 subclade, previously identified as the only Kazakh subclade on the C2*-ST network [10]) was also very close to the dating by STRs (701 CI 95% 909–493 years). Unfortunately, there are no sequences of extended sections of Y-chromosomes for the γ-cluster, and the SNP marker defining this subclade has not yet been determined. The TMRCA of the γ cluster based on STRs is 742 CI 95% 916–568 years.
The nearby coincidence of ages of all three clusters suggests the rapid population growth of Kazakh clans in 13-14th centuries. It nicely coincides with the expansion period of the Mongol Empire. It is important to note that the TMRCA of the “Uissun” γ-cluster coincides with the lifetime of the proposed Uissun’s ancestor and Genghis Khan’s ally Maiky-biy (thirteenth century).
Whose descendants are the clans of South Kazakhstan?
We found that C2-F3796 subclade of haplogroup C2*-ST is the most common in the population of South Kazakhstan. In this sense C2*-ST is a key to decipher direct paternal ancestor of the Senior Zhuz clans. Moreover, according to historical studies [2], the lifetime of the legendary ancestor of the Uissuns (the main population group of South Kazakhstan) coincides with TMRCA of the Uissun cluster.
The oldest known specimen of this lineage (subclade C2-Y4580*) originated from the Mongolian-Buddhist burial of Ulus Dzhuchi (700 years ago) in Central Kazakhstan (Ulytau, Karasauyr burial ground [17]). It is closely related to the Uissun haplogroup C2*-ST.
The only sample of the Wusun culture studied to date (burial Turgen-2, Semirechye, Kazakhstan) belongs to the haplogroup R1a1a-Z93(xZ94) (subclade R1a1a-Y41571) [17]. Other ancient specimens from the Tarim Basin where Wusun lived also belonged to the haplogroup R1a1 [18]. In contrast, all previously studied Kazakh samples belonged to another branch of R1a, namely R1a1a-Z94 (subclade Z2125) [16, 19]. In general, R1a is not frequent among Uissun (6% only), therefore, paternal lineages of the Uissuns likely originated from the early Mongols populations rather than from the Wusun.
According to The Secret History of the Mongols, the early Mongols were divided into Niru’un and Darligin Mongols [6]. Which one of them is the ancestor of the Uissuns? The only successor clan of the Darligin Mongols which has been genetically studied is Konyrat (Kungirat) [6, 20]. The haplogroup C2-M407 is present at high frequency (86%) in Konyrat (Additional file 10), but not in the Uissuns. According to genealogy (Additional file 11), not only the Uissuns but also the Shanyshkly clan of the Senior Zhuz are the descendants of the Niru’un Mongols with dominant C2*-ST haplogroup. In addition, C2*-ST is identified by citizen scientists in several genealogical lineages of the Niru’uns (Keneges, Manghit and Katagan) [21], and among the Hazaras which are considered to be direct descendants of the Niru’un Mongols [10]. As a result, we suggest the origin of the Y-chromosomal lineages of the main populations of South Kazakhstan from the Niru’un Mongols.
Analysis of the downstream SNPs within С2*-ST in south Kazakh clans
A subset of samples (N = 71) has been genotyped by the high resolution SNPs within the haplogroup C2*-ST. First, we genotyped F3796 and F8951, defining two parallel clades, in all 71 samples (Additional file 2). We identified 70 F3796-positive samples and one F8951-positive sample, in perfect agreement with what was predicted from the STR-profiles. New, we genotyped all F3796-positive samples for the 8 downstream markers, reflecting the topology of F3796. This clade (Additional file 9) includes three subclades: F3960 and SK1072 are typical for Mongolic-speaking populations, while the third subclade includes both, Mongolian branch (F9747) and West Central Asian branch (F5481) [10, 22]. This West Central Asian branch includes at least five subbranches: SK1076, F8949, F9033, F11165, Y12782. Our results indicated, that the subbranch Y12782 is most frequent among South Kazakh’ F3796 samples (76%), particularly among Alban, Dulat, and Suan clans. The absolute frequency of this Y12782 subbranch in Uissun tribe is 31%. As for Zhalaiir, most of them (82%) belong to the F5481 branch, but not to the any of its reported subbranches.