Skip to main content
  • Research article
  • Open access
  • Published:

Genetic variation in South Indian castes: evidence from Y-chromosome, mitochondrial, and autosomal polymorphisms



Major population movements, social structure, and caste endogamy have influenced the genetic structure of Indian populations. An understanding of these influences is increasingly important as gene mapping and case-control studies are initiated in South Indian populations.


We report new data on 155 individuals from four Tamil caste populations of South India and perform comparative analyses with caste populations from the neighboring state of Andhra Pradesh. Genetic differentiation among Tamil castes is low (RST = 0.96% for 45 autosomal short tandem repeat (STR) markers), reflecting a largely common origin. Nonetheless, caste- and continent-specific patterns are evident. For 32 lineage-defining Y-chromosome SNPs, Tamil castes show higher affinity to Europeans than to eastern Asians, and genetic distance estimates to the Europeans are ordered by caste rank. For 32 lineage-defining mitochondrial SNPs and hypervariable sequence (HVS) 1, Tamil castes have higher affinity to eastern Asians than to Europeans. For 45 autosomal STRs, upper and middle rank castes show higher affinity to Europeans than do lower rank castes from either Tamil Nadu or Andhra Pradesh. Local between-caste variation (Tamil Nadu RST = 0.96%, Andhra Pradesh RST = 0.77%) exceeds the estimate of variation between these geographically separated groups (RST = 0.12%). Low, but statistically significant, correlations between caste rank distance and genetic distance are demonstrated for Tamil castes using Y-chromosome, mtDNA, and autosomal data.


Genetic data from Y-chromosome, mtDNA, and autosomal STRs are in accord with historical accounts of northwest to southeast population movements in India. The influence of ancient and historical population movements and caste social structure can be detected and replicated in South Indian caste populations from two different geographic regions.


The origins and genetic affinities of India's populations have been debated extensively [16]. Archaeological studies document human occupation of the subcontinent from the lower Paleolithic through the Neolithic, including a flourishing ancient civilization in the Indus Valley [7]. The historical record documents an influx of Vedic Indo-European-speaking immigrants into northwest India starting at least 3500 years ago [811]. These immigrants spread southward and eastward into an existing agrarian society dominated by Dravidian speakers [12]. With time, a more highly-structured patriarchal caste system developed [7, 9, 10]. India is now broadly characterized by Indo-European (e.g. Hindi, Urdu, and Punjabi) speaking populations found in the central and northern regions and by Dravidian (e.g. Tamil, Telugu, and Kannada) speaking populations in the southern and southeastern regions. The extent to which ancient and contemporary migrations, and the more recent inception of a hierarchical caste system, have influenced the genetic composition of modern Indian populations remains controversial.

A number of studies have addressed the genetic contribution of other Eurasian populations to Indian caste and tribal populations [13, 6, 1318]. They have arrived at somewhat different conclusions regarding the origins of castes, their relationships to each other, and their relationship to populations outside India. These discordances can be attributed, in part, to differences in sampling strategies and the varied effects of gene flow between the typically endogamous castes and tribes [14, 1921].

Several trends regarding the origin and affinities of Indian populations have emerged. The predominantly south and east Asian mtDNA haplogroup M is found in more than half of individuals from a wide sampling of castes [5, 6, 13, 22] and is nearly fixed in some Austro-Asiatic tribal populations [6]. This haplogroup is uncommon in western European populations [23, 24]. In contrast, some paternally-inherited Y-chromosome lineages are more closely related to lineages originating in central Asians and Europeans [1, 13, 25, 26]. Genetic distances estimated from autosomal polymorphisms have typically demonstrated that caste populations tend to occupy a position intermediate between European and East Asian populations [8, 2729].

The genetic affinities among the more than 2000 extant caste populations of India, however, are complex. Genetic distances between caste populations from the state of Andhra Pradesh, India, are correlated with differences in caste rank, suggesting that endogamy and differential inter-caste gene flow influences genetic structure [30]. Several studies have found a similar pattern, [3133] but others have not [6, 34]. Higher rank castes may show closer affinity to European populations than do other caste populations [13]. Recent Y-chromosome data suggest a higher affinity between tribal populations and castes of lower rank [35].

These results support historical accounts of nomadic pastoralists from central and northwestern Eurasia integrating with existing local populations, and either introducing a system of social stratification or becoming members of the existing upper castes [8, 9, 35]. Yet, the occurrence of Y-chromosome haplogroups L, H, R2, and R1a in both caste and isolated tribal populations suggests much of the existing Indian population structure is very old [5]. Additionally, the high diversity of Y haplogroups R1a1 and R2 in both South Indian and Indus valley populations has led to the suggestion that there is little, if any, genetic influence from other Eurasians on the castes of South India [3].

A broad study of 24 castes from various locations throughout India concluded that genetic data were not congruent with "sociocultural" affinities due to high rates of gene flow [6]. Yet, this study and others [1, 36] have suggested a clinal (north to south) contribution of central Asian Y-chromosomal lineages to caste populations. Due to well-established clines in gene frequencies across India, especially in the north-south direction, [2, 34, 36] comparisons of castes from different geographic locations can conflate clinal variation with variation that may exist between local caste groups. Therefore, it is important to obtain large, carefully chosen samples from the same geographic locale to determine whether previous results [13] indicating caste-related genetic structure can be replicated in other regions of India [37]. Additionally, because single linkage groups such as the non-recombining region of the Y-chromosome or the mtDNA genome may be strongly influenced by genetic drift or selection, the use of a large number of independent autosomal polymorphisms can greatly improve the reliability of estimates of population relationships.

In this study we analyze four castes of different rank sampled from Tamil Nadu, the southern-most state of the Indian subcontinent. The genetic relationship among the Tamil castes, their relationship to castes from the neighboring state of Andhra Pradesh, and their affinity to other Eurasian populations are examined using Y-chromosome, mtDNA, and autosomal polymorphisms. We show that the genetic affinities between Indian castes from Tamil Nadu and other Eurasians are broadly congruent with patterns observed previously for castes from Andhra Pradesh. These results strengthen the conclusions drawn from our previous analyses regarding caste relationships in South India and suggest reproducible patterns regarding the genetic influence of ancient and historical events on the Indian caste system.


Y-chromosome haplogroups

We evaluated the genetic relationship between Tamil castes, eastern Asians, and Europeans using 32 lineage-specific Y-chromosome SNPs. The sampling locations for the Tamil castes and a comparative set of castes from the neighboring state of Andhra Pradesh are shown in Figure 1. Y-haplogroups F*, H1, J2/J2a, L1, R1a1, and R2 reach appreciable frequencies (> 5%) in most castes. Common Y-haplogroups are typically shared among castes (Tables 1 and 2). Haplogroup R (predominantly R1a1 (27%) and R2 (11%)) is the most common major lineage in the Tamil castes, followed by H (21%, predominantly H1), L (13%, predominantly L1), J (11%, predominantly J2), and F* (10%).

Table 1 Y-chromosome haplogroup frequencies for South Indian castes and major population groups
Table 2 Y-chromosome haplogroup counts for South Indian castes and major population groups
Figure 1
figure 1

Map of South India. A map of the four major states of South India shows the sample locations for the caste populations (figure adapted from Google maps).

Some between-caste trends are suggested by the data. The F* lineage is found at higher frequencies in lower castes than in upper or middle castes. The R1a1 lineage occurs at a higher frequency in upper vs. lower castes and differed significantly in frequency in Andhra upper vs. Andhra lower (p < 0.05). These trends appear in castes from both Tamil Nadu and Andhra Pradesh. Lineage H also reaches substantial frequency in the Tamil lower caste but is less common in upper and middle castes. Lineage J2, previously shown to be distributed in a northwest to southeast gradient, [3] was present in all castes but not correlated with caste rank.

mtDNA haplogroups

Tamil castes are characterized by high frequencies of mitochondrial M and N super-family lineages, and all South Indian lineages could be assigned to either M or N clades (Tables 3 and 4). Both major haplogroup super-families are deep-rooting in South Indian populations, with diversity estimates for N (0.01589, n = 63) exceeding that for M (0.01044, n = 92), based on HVS1 data. In contrast to the South Indian mtDNA haplogroup pool, the eastern Asian and European groups have predominantly either M or N lineages, respectively. High diversity and deep-coalescence dates (> 40 K ybp) for both major mtDNA superfamilies are consistent with an ancient and continuous presence of populations in South India that greatly predates the documented history of the caste system.

Table 3 mtDNA haplogroup frequencies for South Indian castes and major population groups
Table 4 mtDNA haplogroup counts for South Indian castes and major population groups

To further examine potential western and central Eurasian contributions to South Indian castes, mitochondrial U lineages, defined by coding variant 12308G, were analyzed in greater detail (Table 5). U haplogroup subtypes were assigned using key HVS1 variants as previously described [4, 38]. South Asian lineages U2a and U2c are common in Tamil and Andhra castes. U7 is the most prevalent U lineage in Tamil and Andhra castes. U7 is also common in Iran, Pakistan, and northern India, [39] suggesting an affinity between Dravidian populations from South India and populations to the north and west. A comparison of HVS1 for U7 haplogroups (10) with Indian/Pakistani HVS1 sequences available in the mtDB database (4) revealed similar but non-identical motifs, suggesting ancient rather than very recent gene flow between northwestern and southern India. A notable between-caste difference is observed for the mtDNA haplogroup U data in that the Tamil lower caste sample has a lower frequency of U haplogroups (all subclades) than Tamil upper castes (0.317 vs. 0.059, p < 0.05) or middle castes. This trend is also present in the Andhra sample, but it is not significant.

Table 5 mtDNA haplogroup U counts for South Indian castes and Europeans

Genetic distances

We calculated genetic distances between Tamil castes, Europeans, and East Asians and compared these results to those from upper, middle, and lower caste groups from the neighboring state of Andhra Pradesh. The genetic distance estimates reveal several distinct patterns (Table 6).

Table 6 Genetic distance estimates between South Indian castes, Europeans, and eastern Asians

For Y-chromosome polymorphisms, all castes have smaller distances to Europeans than to eastern Asians. For mtDNA polymorphisms, all castes have smaller distance estimates to eastern Asians than to Europeans. For Y-chromosome data, the genetic distance estimates to the Europeans is ordered by caste rank. These trends appear in castes from both geographic regions.

A neighbor-joining network depicts the between-population relationships based on Y-chromosome data (Figure 2). The NTS Upper caste is more closely related to the Andhra Upper caste than to the other Tamil castes, a finding consistent with a common language (Telugu) shared by the NTS Upper and Andhra upper castes. All castes are closer to Europeans than to eastern Asians, and basal haplogroup R is common, especially in the upper castes and Europeans. The inset, however, shows that haplogroups derived from R are not commonly shared between this sample of Europeans and southern Indians. Affinity between the groups is driven largely by basal characters (R, F* and H) that have contrasting frequency patterns.

Figure 2
figure 2

Genetic distances for Y-chromosome data. A neighbor-joining network depicts the genetic distance estimates between South Indian castes, Europeans, and East Asians for 32 Y-chromosome SNPs. The pie diagrams indicate the proportion of each major Y-chromosome lineage found in each population. The inset shows the proportions of Y-chromosome R sub-lineages. Inset circle size is proportional to the total number of R lineages.

A neighbor-joining network based on distance estimates from 45 STRs shows a greater affinity of all castes to Europeans than to eastern Asians (Figure 3). With the exception of the NTS Upper (Telugu and Kannada speaking) Brahmins, castes of similar rank from different geographic locations tend to branch at similar locations within the network. Within each geographic region, the distances to other Eurasians (both Europeans and East Asians) increases with decreasing caste rank.

Figure 3
figure 3

Genetic distances for autosomal data. A neighbor-joining network depicts the genetic distance estimates (DSW) between South Indian castes, Europeans, and East Asians for 45 autosomal STRs.

The network based on mitochondrial distance estimates shows little between-caste rank organization, yet reveals the greater affinity of all castes to eastern Asians for maternal lineages (Figure 4). Basal U haplogroups are less frequent in lower rank castes from both southern India locations. The inset shows that only a few high-resolution U haplogroups (U5, K) are shared between Europeans and South Indians.

Figure 4
figure 4

Genetic distances for mitochrondrial data. A neighbor-joining network depicts the genetic distance estimates between South Indian castes, Europeans, and East Asians for 32 mtDNA SNPs and 411 bp of HVS1 sequence. Pie diagrams indicate the proportion of major mtDNA lineages found in each population. The inset shows the proportion of mtDNA U sub-lineages. Inset circle size is proportional to the total number of U lineages.

Genetic structure

The proportion of genetic variation distributed within and between South Indian castes was assessed by an analysis of molecular variance (AMOVA) (Table 7). The Tamil South Indian castes are only modestly differentiated from one another: 0.96% of STR variance occurs between Tamil castes. A similar value of 0.77% for between-population (caste) difference is observed in the Andhra castes. A smaller fraction, 0.12%, is attributable to geographic differences between Tamil and Andhra locations and was not significantly different from zero. Removal of the NTS Upper caste from the comparison yielded a non-significant but higher value of 0.28%. These findings, based on multiple unlinked loci, suggest that social structure has had a larger impact on caste population structure in these South Indian samples than geographic separation.

Table 7 AMOVA for Y-chromosome, 45 autosomal STRs, and mtDNA

Y-chromosome and mtDNA estimates of molecular variance between castes samples from either Tamil Nadu or Andhra Pradesh also exceed the estimate for between-group variation for the two geographic regions. Between-caste variation for mtDNA in Tamil populations is greater than that for Andhra populations. This may be partly due to regionally high female mobility in Andhra castes as previously reported [20, 30]. As expected, for all genetic systems, the vast majority of all variation occurs within populations.

The degree of population subdivision among Indian castes was estimated using a model-based clustering method implemented in STRUCTURE (ver. 2.1). The best estimate of the number of clusters (K) was consistently one for the Tamil Indians. The best estimate of the K clusters was also one for Tamil and Andhra castes together. This result indicates that individuals from castes spanning the Indian social hierarchy from two independent geographic regions are not sufficiently differentiated to allow clustering into groups based on genetic data from 45 STR polymorphisms alone. This finding is consistent with the low RST values for these populations but may also reflect the limited power of 45 STRs to distinguish such closely related populations. Estimates for heterozygosity and repeat variance in these populations also indicate no substantial between-caste differences or excess homozygosity in these caste groups (Table 8).

Table 8 STR heterozygosity and variance for South Indian caste populations

We evaluated the correlation between caste rank and genetic distance using a Mantel test (Table 9). For each test, a correlation between pairwise genetic and pairwise caste rank distances matrices using the Tamil caste individuals was calculated. For Tamil-speaking populations, all genetic systems produced low, significant positive correlations. Y-chromosome haplogroup data yielded the highest positive correlation with caste rank (ρ = 0.26, p < 0.01). Inclusion of the non-Tamil speaking Brahmins decreases the correlation for all systems.

Table 9 Spearman's correlation between genetic distance and caste rank.


Using a geographically well-defined sample of caste populations from Tamil Nadu, India, this study arrives at many conclusions similar to those from our previous studies of caste populations from Andhra Pradesh, India [13, 20, 30]. In both cases, there is extensive sharing of Y and mtDNA haplogroups among castes, and the overall level of inter-caste differentiation is low. This finding is consistent with many other studies of genetic structure and gene flow patterns among caste populations [6, 32, 33, 40].

Paternally-inherited Y-chromosome SNPs show that caste populations have greater affinity to a sample of Europeans than to a sample of eastern Asians. Unlike the Y-chromosome data, maternally-inherited mtDNA polymorphisms demonstrate a contrasting pattern – castes, regardless of rank, have higher affinity to eastern Asians than to Europeans. These patterns were present in samples from both geographical locations suggesting that South Indian paternal lineages have been more substantially influenced by western or central Eurasians compared to South Indian maternal lineages. Unlike our previous study of Andhra castes, [13] direct haplogroup sharing between Tamil castes and our sample of Europeans is more limited, suggesting a potentially greater time depth for the development of these patterns. More extensive sampling will be required to resolve this difference.

Using Y-chromosome data, Tamil castes of different rank have differential affinities to our sample of Europeans, with upper castes demonstrating greater affinity than lower castes. Genetic distances are weakly correlated with caste rank distances and correlations from Y-chromosome data are stronger than correlations based on mtDNA or autosomal data. This pattern argues for a differential contribution of male lineages to castes of different rank and limited male mobility between castes in South India.

An interesting difference between the data sets from Andhra Pradesh and Tamil Nadu is also observed. For the former sample, inter-caste distance based on mtDNA polymorphisms (HVS1 sequence) demonstrated a strong relationship to caste rank, while distances based on Y-chromosome data did not. This was interpreted as evidence of historical upward female mobility in the caste system [30]. (We note, however, that the primary reason for a lack of correlation between Y-chromosome distances and caste rank was close affinity between the upper-caste Brahmin and lower-caste Relli samples [20].) In contrast, the Tamil Nadu samples show a higher correlation between Y-chromosome distances and caste rank than between mtDNA distances and caste rank. This difference likely reflects differential apportioning of individuals as the caste system originated or subsequent differences in male-female mobility patterns.

Recently, several studies have underscored the complexity of Y-chromosome variation in Indian populations. Sahoo et al. (2006) presented evidence that the R1a haplogroup has attained high frequencies and high diversity in northern India, central Asia, and eastern Europe. They also reported high frequencies of Y-chromosome haplogroup H in caste and tribal populations and provided compelling evidence for an origin of haplogroup H in South India. Upon further analysis, their data show that, as in our study, the frequency of haplogroup R lineages is higher in Brahmins (upper rank) than in lower-rank castes (0.53 vs. 0.41), while the frequency of H lineages is lower in Brahmins than in lower castes (0.15 vs. 0.34).

In a study of broadly distributed Indo-European and Dravidian castes, Sengupta et al. (2006) suggested that the majority of Indian Y-chromosome haplogroups are at least 10,000 to 15,000 years old as gauged by Y-chromosome microsatellite diversity, thus predating the origin of the caste system. The antiquity and complex geographic distribution of the R1a1 and R2 haplogroups led these authors to conclude that the majority of the subcontinent Y-chromosomes arrived in or before the early Holocene (10,000 years ago) rather than in a later Indo-European expansion. Likewise, and concordant with other studies of tribal Indian populations, [5] we observe Y-chromosome R1a1 lineages in South Indian tribal Irula (unpublished data), a population substantially differentiated from South Indian castes [18].

An examination of the R and H haplogroup frequencies of Indo-European-speaking castes reported by Sengupta et al. (2006) shows that, as in our study, R haplogroup frequencies in upper castes exceeded those of middle and lower castes (0.62, 0.35, and 0.38, respectively), while H haplogroup frequencies were lowest in upper castes (0.14), intermediate in middle castes (0.38), and most frequent in lower castes (0.44). For Dravidian castes, R (0.62) was more frequent than H (0.14) in upper castes while R and H had similar (within 6%) frequencies in middle and lower castes.

A recent analysis of caste and tribal populations from eastern India (Orissa) demonstrated Indo-European influences on paternal caste lineages [41]. Brahmins showed high Y-chromosome affinity to eastern Europeans (M17, haplogroup R1a1). In contrast, maternal mtDNA polymorphisms revealed primarily Indian-specific lineages. Taken together, our studies and at least three other studies of Y-chromosome lineages in Indian castes demonstrate that upper castes show genetic affinity to populations residing north and northwest of the Indian subcontinent. This affinity appears, in part, to result from varying frequencies of Y-chromosome R lineages and older South Asian lineages such as F* and H.

Indian mtDNA lineages demonstrate high diversity, suggesting that a majority of Indian maternal lineages are also relatively old and likely predate historically documented expansion events [38, 42]. Older, deep-rooting mitochondrial lineages belonging to the N macrolineage are prevalent in western Eurasia and are distributed in a West – East cline, with high frequencies in Anatolia and Iran and moderate frequencies in Pakistan and northwestern India [43]. In this study we observe higher frequencies of basal U lineages in upper castes than in lower castes. Higher resolution haplogroup results, however, show little evidence of between caste differences. This may indicate differences in founding populations. More likely, though, it may suggest ancient migration and integration of various U haplogroups into different pre-caste populations with subsequent, non-uniform lineage sorting and differentiation over time. In contrast, and consistent with early human expansion across South Asia, the predominantly Asian M clade mitochondrial haplogroups account for more than half of all Indian mitochondrial lineages and reach their highest frequencies in lower caste and tribal groups [6, 13].

While Y-chromosome and mtDNA polymorphisms yield valuable information, it must be borne in mind that they each represent a single linkage group. Estimates based on these systems are thus subject to a high level of stochastic variability [44, 45]. In addition, the Y-chromosome and mtDNA may both have been affected by natural selection, [46, 47] which can further complicate the interpretation of population history. Coalescence dates based on these systems must also be viewed with appropriate caution, in part because of their large confidence intervals. More importantly, a coalescence date is not necessarily a reliable indicator of the founding date of a population [45] because these dates are affected by the size of the founder population and by subsequent gene flow patterns. To gain a more complete and reliable portrait of population history, multiple, independent autosomal polymorphisms should also be examined.

Our analysis of 45 unlinked autosomal STRs reveals that in Tamil Nadu, genetic distances between castes are positively correlated with caste rank. A similar pattern was detected in upper, middle, and lower rank castes of Andhra Pradesh using these STRs [20] and Alu and L1 insertion polymorphisms [13]. An analysis of the Kallar, Vanniyar, and Pallar castes, which also reside in Tamil Nadu, showed that upper – lower caste distance estimates (0.0553) exceeded those for upper – middle castes (0.0329) and middle – lower castes (0.0515) [40]. Majumder et al. [37, 48] presented Y-chromosome, mtDNA, and autosomal data from several caste populations in Uttar Pradesh. Subsequent analysis indicated that caste rank was correlated with genetic distance for all three types of systems [20]. Similar correlations have been observed in a number of other studies of Indian populations [31, 33, 49]. A relatively greater affinity between upper-caste populations and Europeans has been observed for autosomal polymorphisms in our Andhra Pradesh and Tamil Nadu samples and in a number of other analyses of autosomal data [6, 50, 51].

Although significant correlations between caste-rank and genetic distances are apparent, model-based clustering algorithms did not detect structure within the Tamil or Andhra populations. We suggest that this finding results from the low amount of differentiation between all caste groups but also from a lack of sufficient power in 45 unlinked STRs to detect high-resolution population structure. With ~250 K SNPs typed in a subset of the Andhra upper and Andhra lower castes, individuals can be clustered into these population groups using genotype information alone [52]. Likewise, using > 950 K SNPs, the Tamil upper and Tamil lower castes demonstrate group-specific clustering by principal component analysis (unpublished data).

Considering the complex history of Indian populations, it is not surprising that some studies demonstrate an association between caste rank and genetic distance, whereas others do not. A recent study of 15 geographically dispersed Indian populations residing in the United States using 1200 markers found little evidence for caste or geographic structure [53]. However, sampling strategy (relocated vs. in situ) or other factors, such as a very wide geographic dispersion of the study populations, may confound correlations if they exist. Admixture and gene flow can also vary substantially between caste populations in the various regions of India. Linguistic differences may influence the genetic structure of local caste populations [34]. The linguistically different NTS Upper caste Brahmins showed several differences in comparison to the other Tamil castes in this analysis. Yet, because Indian populations show only a small amount of genetic differentiation, [17, 53] a large number of autosomal loci will be necessary for adequate power to detect consistent patterns of variation if they are present [54, 55]. Ancestry-informative autosomal polymorphisms, high-density genotyping, and extensive population sampling will provide better resolution of the relationships between Indian and other Eurasian populations.

The results presented here underscore the complexity of the Indian caste system. Although other interpretations may be possible, our data are consistent with a model in which nomadic populations from northwest and central Eurasia intercalated over millennia into an already complex, genetically diverse set of subcontinental populations. As these populations grew, mixed, and expanded, a system of social stratification likely developed in situ, spreading to the Indo-Gangetic plain, and then southward over the Deccan plateau. A strong patrilineal social structure, accompanied by a developing practice of caste endogamy, may have contributed to an asymmetric apportioning of Y-chromosome, autosomal, and to a lesser extent, mtDNA lineages. Remnants of these patterns can still be detected in some of the inhabitants of peninsular South India.


Genetic variation between South Indian castes from Tamil Nadu is low (RST = 0.0096). Tamil caste Y-chromosomes and STR alleles are more similar to Europeans than to eastern Asians, and genetic distance estimates to Europeans are ordered by caste rank. In contrast, Tamil caste mtDNA shows greater similarity to eastern Asians than to Europeans. Low, but statistically significant, correlations between genetic distance and caste rank can be demonstrated for the Tamil-speaking populations. These patterns likely reflect asymmetric influences of ancient and historical processes on the caste system as it developed. These findings provide a general replication of our analysis of ranked castes from the neighboring state of Andhra Pradesh, India [13]. For the caste populations analyzed here, between-caste genetic differentiation exceeds that due to geographic (between-state) differentiation, a finding that may be of considerable interest when initiating linkage mapping [56] and case-control association studies in South Indian populations.


Study Subjects

Study subjects were recruited from four caste groups in Tamil Nadu, India. Tamil-speaking Brahmins (41), Mudaliars (43), and Dalits (Harijans) (34) were sampled in Chennai or from rural locations near Chennai. Caste rank was assigned using the traditional varna of Brahmin (Brahmin, upper ranking), Mudaliar (Sudra, middle ranking), and Dalit (scheduled caste – outside the traditional caste system, lower ranking). A second sampling of Brahmins (37) was obtained in Kanchipuram, located ~70 km southwest of Chennai. The Kanchipuram Brahmin group is linguistically diverse, containing Kannada- and Telugu-speaking Brahmins that relocated from the neighboring states of Andhra Pradesh and Karnataka. This group of upper castes individuals is referred to subsequently as the non-Tamil speaking (NTS) Upper caste. This study was approved by the Schizophrenia Research Foundation, Chennai, India and by the Wolston Park Hospital Ethics Committee, Brisbane. Approvals were also obtained from the Indian Council of Medical Research and the Indian Ministry of Commerce. Written, informed consent was obtained from all participants.

A comparative European sample of northern European and French ancestry (57), and eastern Asians of Chinese, Japanese, and S.E. Asian ancestry (28) have been previously described [28, 57, 58]. Because all samples were required to have data for all genetic systems thus excluding females, sample sizes are smaller than previously reported. The comparative sample of populations from Andhra Pradesh, India includes upper-caste Brahmins (33), middle-caste Kapus and Yadavas (80), and lower-caste Malas, Madigas, and Rellis (54) [13].

Data collection

DNA was extracted from venous blood using standard procedures. Hypervariable sequence 1 (HVS1), corresponding to base pairs 16000 – 16410, was amplified by PCR and sequenced using BigDye 3.1 dye-terminator fluorescent sequencing chemistry and an Applied Biosystems (ABI) 3100 automated sequencer.

Lineage and sub-lineage identifying single nucleotide polymorphisms (SNPs) for the mitochondria (32 markers) and Y-chromosome (32 markers) were selected from the literature [47, 5964]. Lineage-defining mitochondrial coding region markers used in the study are L2-C10810T, M-C10400T, C-A13263G, D-C4883T, preE-G4491A, E-G7598A, G-A4833G, Z-T9090C, N-C10873T, N1d-C6713T, Y-A7933G, W-G8994A, R-T12705C, R5-T8594C, J-A12612G, T-T10463C, H/V-T14766C, U-A12308G, U6-A3348G, U6a-G7805A, U5-T3197C, U5a1-A14793G, U5a/b-A7768G, U2-K-A1811G, U2-A3720G, U2-A9545G, U3-G9266A, U4-T4646C, U7-C5360T, U7-C8137T, K/U8-G9055A, and U9-G3531A. Y-chromosome lineages and markers used are C-M216, F*-M89, G-M201, H-M52, H1-M82, H1a-M36, H1b-M97, H1c-M138, I-M170, J2-M172, J2a-M410, K*-M9, K1-SRY9138, K2-M70, L-M20, L1-M76, M-M5, N-LLY22g, O-M175, O3-M122, P*-M74, Q-P36, Q3-M3, R*-M207, R1-M173, R1a-SRY10831.2, R1a1-M17, R1a1a-M56, R1a1b-M157, R1a1c-M87, R1b3-M269, and R2-M124.

Mitochondrial and Y-chromosome SNPs were genotyped by fluorescent primer extension using SNaPshot chemistry (ABI). Primers were annealed to amplification products adjacent to the polymorphic site and extended by one nucleotide using the manufacturer's recommendations. Extension products were pooled and resolved on a 36-cm capillary array. Four to eight SNPs were assayed per multiplex. Forty-five STRs, predominantly tetranucleotide repeats, were amplified using 5'-NED, -PET, -VIC, or -6-FAM labeled primers using standard PCR conditions and resolved in 5 fluorescent multiplex runs on an ABI 3100. STR loci are UT1091, UT1201, UT1205, UT1220, UT1227, UT1228, UT1232, UT1239, UT1243, UT1257, UT1313, UT1352, UT1357, UT1376, UT1674, UT1708, UT1740, UT1747, UT1880, UT1885, UT1917, UT1950, UT1985, UT2021, UT2081, UT2092, UT2127, UT2203, UT5022, UT5027, UT5029, UT5030, UT5033, UT5048, UT5492, UT6507, UT6516, UT6540, UT7131, UT8067, UT868, UT871, UT901, UT919, and vWFII. These STRs and mtDNA polymorphisms, were typed in comparative populations as described previously [13, 18, 20, 57, 58]. Y-chromosome, STR, and mtDNA genotype data is provided in the Additional_file 1.

To allow a direct comparison of Y-chromosome haplogroups from Tamil Nadu castes to those from Andhra Pradesh castes, we typed individuals from Andhra Pradesh for 26 of the 32 lineage-defining SNPs. A Y-haplogroup was assigned to each sample by the presence of one or more derived-state alleles, and the remaining alleles were inferred. This SNP panel allowed further refinement of the haplogroups previously reported for the Andhra Pradesh samples [13, 30].

Data analysis

Haplogroups for the Y-chromosome (32 SNPs) and mtDNA (32 SNPs and 411bp HVS1 sequence) were assigned using SNP data. Mitochondrial haplogroups were assigned to a haplogroup based on the most probable consensus of polymorphic changes or resolved using previously published mtDNA HVS1 motifs as a guide [62]. Thirty-one exceptions to the canonical mtDNA phylogeny occurred on 27 mtDNA haplogroups, and these haplogroups with recurrent mutations were assigned to the most likely haplogroup based on HVS1 sequence data [4, 6]. The variant 7598A, defining mtDNA lineage M-E, was found in 2 Tamil and 1 Andhra individuals who share identical HVS1 motifs but lack the preE 4491A variant. Between-caste haplogroup differences were evaluated for significance using Fisher's exact test.

Diversity estimates (FST, RST, and AMOVA) for Y-chromosome, mtDNA, and autosomal STRs were calculated using the ARLEQUIN 3.0 software package [65]. AMOVA statistics were evaluated for significance by comparison to an empirical distribution generated by random permutation of genotypes or haplogroups. A general age estimate for mtDNA coalescent dates was calculated by the method of Nei [66] using a substitution rate of 2 × 10-7 substitutions/site/year [67].

Model-based analyses of population structure were performed using the STRUCTURE program [68]. An estimate of the optimal number of clusters (K) for the four Tamil castes was obtained from the posterior probabilities of K, P(X|K), averaged over 10 runs for each value of K. A uniform prior probability distribution was assumed on K = {1...n}, and burn-in and iterations were set to 10,000 each for estimating the best K. Estimates of proportionate membership to three clusters were averaged values from 10 independent STRUCTURE runs. Population admixture and correlated allele frequencies were used in all analyses.

The correlation between genetic distance and caste rank was assessed by Mantel matrix tests using Spearman's rank correlation. For all possible pairs of caste individuals, inter-individual genetic distance estimates were calculated using DNADIST (Y and mtDNA) [69] or the Dsw program (STRs) [70]. Next, each individual was assigned a ranking (1, 2, or 3) for upper, middle, and lower caste status. The difference in caste rank was calculated for all possible pairs of caste individuals, yielding a full pair-wise matrix (155 × 155, or 118 × 118 for Tamil-speakers only) of ordinal values (0, 1, 2). Spearman's rank correlation between the genetic distance (Y-chromosome, mtDNA, or autosomal STRs) matrix and the caste rank difference matrix was calculated. A significance level for the correlation was determined by comparing the actual correlation to a distribution of correlations generated by 10,000 random columnar permutations.


  1. Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM, Stoneking M: Independent origins of Indian caste and tribal paternal lineages. Curr Biol. 2004, 14 (3): 231-235.

    Article  CAS  PubMed  Google Scholar 

  2. Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, Gaikwad S, Trivedi R, Endicott P, Kivisild T, Metspalu M, et al: A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios. Proc Natl Acad Sci USA. 2006, 103 (4): 843-848. 10.1073/pnas.0507714103.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  3. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A, et al: Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of central asian pastoralists. American journal of human genetics. 2006, 78 (2): 202-221. 10.1086/499411.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E, Reidla M, Laos S, Parik J, Watkins WS, Dixon ME, et al: Deep common ancestry of indian and western-Eurasian mitochondrial DNA lineages. Curr Biol. 1999, 9 (22): 1331-1334. 10.1016/S0960-9822(00)80057-3.

    Article  CAS  PubMed  Google Scholar 

  5. Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, Parik J, Metspalu E, Adojaan M, Tolk HV, Stepanov V, et al: The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. American journal of human genetics. 2003, 72 (2): 313-332. 10.1086/346068.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, Dey B, Roy M, Roy B, Bhattacharyya NP, et al: Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res. 2003, 13 (10): 2277-2290. 10.1101/gr.1413403.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Bhasin MK, Walter H: Genetics of castes and tribes of India. 2001, Delhi: Kamla-Raj Enterprises

    Google Scholar 

  8. Cavalli-Sforza LL, Menozzi P, Piazza A: The History and Geography of Human Genes. 1994, Princeton: Princeton University Press

    Google Scholar 

  9. Majumdar RC, Raychaudhuri HC, Datta K: An Advanced History of India. 1950, London: Macmillan

    Google Scholar 

  10. Rawlinson HG: India, a short cultural history. 1954, London: The Cresset Press, 5

    Google Scholar 

  11. Thapar R: Early India. 2002, Berkeley: University of California Press

    Google Scholar 

  12. Karve I: Hindu society: an interpretation. 1961, Poona: Deshmukh Prakashan

    Google Scholar 

  13. Bamshad M, Kivisild T, Watkins WS, Dixon ME, Ricker CE, Rao BB, Naidu JM, Prasad BV, Reddy PG, Rasanayagam A, et al: Genetic evidence on the origins of Indian caste populations. Genome Res. 2001, 11 (6): 994-1004. 10.1101/gr.GR-1733RR.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Bhattacharyya NP, Basu P, Das M, Pramanik S, Banerjee R, Roy B, Roychoudhury S, Majumder PP: Negligible male gene flow across ethnic boundaries in India, revealed by analysis of Y-chromosomal DNA polymorphisms. Genome Res. 1999, 9 (8): 711-719.

    CAS  PubMed  Google Scholar 

  15. Kumar V, Basu D, Reddy BM: Genetic heterogeneity in northeastern India: reflection of Tribe-Caste continuum in the genetic structure. Am J Hum Biol. 2004, 16 (3): 334-345. 10.1002/ajhb.20027.

    Article  PubMed  Google Scholar 

  16. Roy S, Thakur Mahadik C, Majumder PP: Mitochondrial DNA variation in ranked caste groups of Maharashtra (India) and its implication on genetic relationships and origins. Ann Hum Biol. 2003, 30 (4): 443-454. 10.1080/0301446031000111410.

    Article  PubMed  Google Scholar 

  17. Reddy BM, Naidu VM, Madhavi VK, Thangaraj K, Langstieh BT, Venkataramana P, Kumar V, Singh L: STR data for the Amp FlSTR Profiler Plus loci among 27 populations of different social hierarchy from southern part of Andhra Pradesh, India. Forensic Sci Int. 2005, 149 (1): 81-97. 10.1016/j.forsciint.2004.06.005.

    Article  CAS  PubMed  Google Scholar 

  18. Watkins WS, Prasad BV, Naidu JM, Rao BB, Bhanu BA, Ramachandran B, Das PK, Gai PB, Reddy PC, Reddy PG, et al: Diversity and divergence among the tribal populations of India. Annals of human genetics. 2005, 69: 680-692. 10.1046/j.1529-8817.2005.00200.x.

    Article  CAS  PubMed  Google Scholar 

  19. Ramana GV, Su B, Jin L, Singh L, Wang N, Underhill P, Chakraborty R: Y-chromosome SNP haplotypes suggest evidence of gene flow among caste, tribe, and the migrant Siddi populations of Andhra Pradesh, South India. Eur J Hum Genet. 2001, 9 (9): 695-700. 10.1038/sj.ejhg.5200708.

    Article  CAS  PubMed  Google Scholar 

  20. Wooding S, Ostler C, Prasad BV, Watkins WS, Sung S, Bamshad M, Jorde LB: Directional migration in the Hindu castes: inferences from mitochondrial, autosomal and Y-chromosomal data. Human genetics. 2004, 115 (3): 221-229. 10.1007/s00439-004-1130-x.

    Article  CAS  PubMed  Google Scholar 

  21. Chaubey G, Metspalu M, Kivisild T, Villems R: Peopling of South Asia: investigating the caste-tribe continuum in India. Bioessays. 2007, 29 (1): 91-100. 10.1002/bies.20525.

    Article  CAS  PubMed  Google Scholar 

  22. Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin SM, Stoneking M: Mitochondrial DNA analysis reveals diverse histories of tribal populations from India. Eur J Hum Genet. 2003, 11 (3): 253-264. 10.1038/sj.ejhg.5200949.

    Article  CAS  PubMed  Google Scholar 

  23. Herrnstadt C, Elson JL, Fahy E, Preston G, Turnbull DM, Anderson C, Ghosh SS, Olefsky JM, Beal MF, Davis RE, et al: Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups. American journal of human genetics. 2002, 70 (5): 1152-1171. 10.1086/339933.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R, Obinu D, Savontaus ML, Wallace DC: Classification of European mtDNAs from an analysis of three European populations. Genetics. 1996, 144 (4): 1835-1850.

    PubMed Central  CAS  PubMed  Google Scholar 

  25. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, Blue-Smith J, Jin L, Su B, Pitchappan R, Shanmugalakshmi S, et al: The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc Natl Acad Sci USA. 2001, 98 (18): 10244-10249. 10.1073/pnas.171305098.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C: A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia. American journal of human genetics. 2002, 71 (3): 466-482. 10.1086/342096.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  27. Vishwanathan H, Deepa E, Cordaux R, Stoneking M, Usha Rani MV, Majumder PP: Genetic structure and affinities among tribal populations of southern India: a study of 24 autosomal DNA markers. Annals of human genetics. 2004, 68: 128-138. 10.1046/j.1529-8817.2003.00083.x.

    Article  CAS  PubMed  Google Scholar 

  28. Watkins WS, Rogers AR, Ostler CT, Wooding S, Bamshad MJ, Brassington AM, Carroll ML, Nguyen SV, Walker JA, Prasad BV, et al: Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms. Genome Res. 2003, 13 (7): 1607-1618. 10.1101/gr.894603.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  29. Bamshad MJ, Wooding S, Watkins WS, Ostler CT, Batzer MA, Jorde LB: Human population genetic structure and group membership. American journal of human genetics. 2003, 72: 578-589. 10.1086/368061.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  30. Bamshad MJ, Watkins WS, Dixon ME, Jorde LB, Rao BB, Naidu JM, Prasad BV, Rasanayagam A, Hammer MF: Female gene flow stratifies Hindu castes. Nature. 1998, 395 (6703): 651-652. 10.1038/27103.

    Article  CAS  PubMed  Google Scholar 

  31. Char KSN, Lakshmi P, Gopalam KB, Sastry JG, Rao PR: Genetic differentiation among some endogamous populations of Andhra Pradesh, India. Am J Phys Anthrop. 1989, 78: 421-429. 10.1002/ajpa.1330780310.

    Article  CAS  PubMed  Google Scholar 

  32. Lakshmi N, Demarchi DA, Veerraju P, Rao TV: Population structure and genetic differentiation among the substructured Vysya caste population in comparison to the other populations of Andhra Pradesh, India. Ann Hum Biol. 2002, 29 (5): 538-549. 10.1080/03014460110114707.

    Article  CAS  PubMed  Google Scholar 

  33. Papiha SS: Genetic variation in India. Hum Biol. 1996, 68 (5): 607-628.

    CAS  PubMed  Google Scholar 

  34. Majumder PP: People of India: biological diversity and affinities. Evol Anthrop. 1998, 6: 100-110. 10.1002/(SICI)1520-6505(1998)6:3<100::AID-EVAN4>3.0.CO;2-I.

    Article  Google Scholar 

  35. Thanseem I, Thangaraj K, Chaubey G, Singh VK, Bhaskar LV, Reddy BM, Reddy AG, Singh L: Genetic affinities among the lower castes and tribal groups of India: inference from Y chromosome and mitochondrial DNA. BMC genetics. 2006, 7: 42-10.1186/1471-2156-7-42.

    Article  PubMed Central  PubMed  Google Scholar 

  36. Passarino G, Semino O, Bernini LF, Santachiara-Benerecetti AS: Pre-Caucasoid and Caucasoid genetic features of the Indian population, revealed by mtDNA polymorphisms. American journal of human genetics. 1996, 59 (4): 927-934.

    PubMed Central  CAS  PubMed  Google Scholar 

  37. Majumder PP: Indian caste origins: genomic insights and future outlook. Genome Res. 2001, 11 (6): 931-932. 10.1101/gr.192401.

    Article  CAS  PubMed  Google Scholar 

  38. Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F, Wang CY, Chaudhuri TK, Palla V, Zhang YP: Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. American journal of human genetics. 2004, 75 (6): 966-978. 10.1086/425871.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  39. Metspalu M, Kivisild T, Metspalu E, Parik J, Hudjashov G, Kaldma K, Serk P, Karmin M, Behar DM, Gilbert MT, et al: Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC genetics. 2004, 5: 26-10.1186/1471-2156-5-26.

    Article  PubMed Central  PubMed  Google Scholar 

  40. Sitalaximi T, Trivedi R, Kashyap VK: Microsatellite diversity among three endogamous Tamil populations suggests their origin from a separate Dravidian genetic pool. Hum Biol. 2003, 75 (5): 673-685. 10.1353/hub.2003.0079.

    Article  CAS  PubMed  Google Scholar 

  41. Sahoo S, Kashyap VK: Phylogeography of mitochondrial DNA and Y-Chromosome haplogroups reveal asymmetric gene flow in populations of Eastern India. Am J Phys Anthropol. 2006, 131 (1): 84-97. 10.1002/ajpa.20399.

    Article  PubMed  Google Scholar 

  42. Baig MM, Khan AA, Kulkarni KM: Mitochondrial DNA diversity in tribal and caste groups of Maharashtra (India) and its implication on their genetic origins. Annals of human genetics. 2004, 68 (Pt 5): 453-460. 10.1046/j.1529-8817.2004.00108.x.

    Article  CAS  PubMed  Google Scholar 

  43. Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, Scozzari R, Rengo C, Al-Zahery N, Semino O, Santachiara-Benerecetti AS, et al: Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor. American journal of human genetics. 2004, 74 (5): 827-845. 10.1086/383236.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  44. Nei M, Roychoudhury AK: Evolutionary relationships of human populations on a global scale. Mol Biol Evol. 1993, 10: 927-943.

    CAS  PubMed  Google Scholar 

  45. Goldstein DB, Chikhi L: Human Migrations and Population Structure: What We Know and Why it Matters. Annu Rev Genomics Hum Genet. 2002, 3: 129-152. 10.1146/annurev.genom.3.022502.103200.

    Article  CAS  PubMed  Google Scholar 

  46. Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, Hosseini S, Brandon M, Easley K, Chen E, Brown MD, et al: Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci USA. 2003, 100 (1): 171-176. 10.1073/pnas.0136972100.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  47. Jobling MA, Tyler-Smith C: The human Y chromosome: an evolutionary marker comes of age. Nat Rev Genet. 2003, 4 (8): 598-612. 10.1038/nrg1124.

    Article  CAS  PubMed  Google Scholar 

  48. Majumder PP, Roy B, Banerjee S, Chakraborty M, Dey B, Mukherjee N, Roy M, Thakurta PG, Sil SK: Human-specific insertion/deletion polymorphisms in Indian populations and their possible evolutionary implications. Eur J Hum Genet. 1999, 7 (4): 435-446. 10.1038/sj.ejhg.5200317.

    Article  CAS  PubMed  Google Scholar 

  49. Dutta R, Reddy BM, Chattopadhyay P, Kashyap VK, Sun G, Deka R: Patterns of genetic diversity at the nine forensically approved STR loci in the Indian populations. Hum Biol. 2002, 74 (1): 33-49. 10.1353/hub.2002.0002.

    Article  PubMed  Google Scholar 

  50. Deka R, Shriver MD, Yu LM, Heidreich EM, Jin L, Zhong Y, McGarvey ST, Agarwal SS, Bunker CH, Miki T, et al: Genetic variation at twentythree microsatellite loci in sixteen populations. J Genet. 1999, 78: 99-121. 10.1007/BF02924561.

    Article  Google Scholar 

  51. Majumder PP: Ethnic populations of India as seen from an evolutionary perspective. J Biosci. 2001, 26 (4 Suppl): 533-545. 10.1007/BF02704750.

    Article  CAS  PubMed  Google Scholar 

  52. Xing J, Watkins WS, Witherspoon DJ, Zhang Y, Guthery SL, Mowry BJ, Bulayeva K, Weiss RB, Jorde LB: Fine-scaled human genetic structure revealed by SNP microarrays.

  53. Rosenberg NA, Mahajan S, Gonzalez-Quevedo C, Blum MG, Nino-Rosales L, Ninis V, Das P, Hegde M, Molinari L, Zapata G, et al: Low levels of genetic divergence across geographically and linguistically diverse populations from India. PLoS genetics. 2006, 2 (12): 2052-2061. 10.1371/journal.pgen.0020215.

    Article  CAS  Google Scholar 

  54. Witherspoon DJ, Marchani EE, Watkins WS, Ostler CT, Wooding SP, Anders BA, Fowlkes JD, Boissinot S, Furano AV, Ray DA, et al: Human population genetic structure and diversity inferred from polymorphic L1(LINE-1) and Alu insertions. Human heredity. 2006, 62 (1): 30-46. 10.1159/000095851.

    Article  CAS  PubMed  Google Scholar 

  55. Witherspoon DJ, Wooding S, Rogers AR, Marchani EE, Watkins WS, Batzer MA, Jorde LB: Genetic similarities within and between human populations. Genetics. 2007, 176 (1): 351-359. 10.1534/genetics.106.067355.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  56. Holliday EG, Nyholt DR, Tirupati S, John S, Ramachandran P, Ramamurti M, Ramadoss AJ, Jeyagurunathan A, Kottiswaran S, Smith HJ, et al: Strong Evidence for a Novel Schizophrenia Risk Locus on Chromosome 1p31.1 in Homogeneous Pedigrees From Tamil Nadu, India. The American journal of psychiatry. 2008

    Google Scholar 

  57. Jorde LB, Bamshad MJ, Watkins WS, Zenger R, Fraley AE, Krakowiak PA, Carpenter KD, Soodyall H, Jenkins T, Rogers AR: Origins and affinities of modern humans: a comparison of mitochondrial and nuclear genetic data. American journal of human genetics. 1995, 57 (3): 523-538.

    PubMed Central  CAS  PubMed  Google Scholar 

  58. Jorde LB, Rogers AR, Bamshad M, Watkins WS, Krakowiak P, Sung S, Kere J, Harpending HC: Microsatellite diversity and the demographic history of modern humans. Proc Natl Acad Sci USA. 1997, 94 (7): 3100-3103. 10.1073/pnas.94.7.3100.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  59. Ingman M, Gyllensten U: Analysis of the complete human mtDNA genome: methodology and inferences for human evolution. The Journal of heredity. 2001, 92 (6): 454-461. 10.1093/jhered/92.6.454.

    Article  CAS  PubMed  Google Scholar 

  60. Ingman M, Gyllensten U: mtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences. Nucleic acids research. 2006, 34: D749-751. 10.1093/nar/gkj010.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  61. Ingman M, Kaessmann H, Paabo S, Gyllensten U: Mitochondrial genome variation and the origin of modern humans. Nature. 2000, 408 (6813): 708-713. 10.1038/35047064.

    Article  CAS  PubMed  Google Scholar 

  62. Maca-Meyer N, Gonzalez AM, Larruga JM, Flores C, Cabrera VM: Major genomic mitochondrial lineages delineate early human expansions. BMC genetics. 2001, 2: 13-10.1186/1471-2156-2-13.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  63. Underhill PA, Passarino G, Lin AA, Shen P, Mirazon Lahr M, Foley RA, Oefner PJ, Cavalli-Sforza LL: The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Annals of human genetics. 2001, 65 (Pt 1): 43-62. 10.1046/j.1469-1809.2001.6510043.x.

    Article  CAS  PubMed  Google Scholar 

  64. Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, Bonne-Tamir B, Bertranpetit J, Francalacci P, et al: Y chromosome sequence variation and the history of human populations. Nature genetics. 2000, 26 (3): 358-361. 10.1038/81685.

    Article  CAS  PubMed  Google Scholar 

  65. Excoffier L, Laval G, Schneider S: Arlequin ver. 3.0: An intergrated software package for population genetics data analysis. Evolutionary bioinformatics online. 2005, 1: 47-50.

    PubMed Central  CAS  Google Scholar 

  66. Nei M: Molecular Evolutionary Genetics. 1987, New York: Columbia University Press

    Google Scholar 

  67. Endicott P, Ho SY: A Bayesian evaluation of human mitochondrial substitution rates. American journal of human genetics. 2008, 82 (4): 895-902. 10.1016/j.ajhg.2008.01.019.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  68. Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155 (2): 945-959.

    PubMed Central  CAS  PubMed  Google Scholar 

  69. Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. 2004, Distributed by the author. Department of Genome Sciences, University of Washington, Seattle

    Google Scholar 

  70. Shriver MD, Jin L, Boerwinkle E, Deka R, Ferrell RE, Chakraborty R: A novel measure of genetic distance for highly polymorphic tandem repeat loci. Mol Biol Evol. 1995, 12 (5): 914-920.

    CAS  PubMed  Google Scholar 

Download references


The authors thank the study individuals for their participation, J.R. Ayankaran for help in recruiting samples, and J. Xing for helpful discussions. Chris Tyler-Smith generously supplied primer sequences for LLY22g. This work was supported by NSF grants SBR-9514733 and SBR-9512178.

Author information

Authors and Affiliations


Corresponding author

Correspondence to LB Jorde.

Additional information

Authors' contributions

WSW carried out the molecular studies, performed data analysis, and drafted the manuscript. RT performed analysis and sample collection in India. BJM and DN designed and partially funded the study. YZ performed genotyping and laboratory experiments. DJW provided statistical consultation. WT performed genotyping and other laboratory experiments. MJB helped acquire and analyze samples from Andhra Pradesh. ST and RP performed sample collection in Tamil Nadu. HS and CF performed sample extraction and laboratory analysis of samples from Tamil Nadu. LBJ designed, coordinated, and funded the study. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Y-chromosome, STR, and mtDNA genotype data. Genotype data for South Indians, Europeans, and eastern Asians. (XLS 743 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Watkins, W., Thara, R., Mowry, B. et al. Genetic variation in South Indian castes: evidence from Y-chromosome, mitochondrial, and autosomal polymorphisms. BMC Genet 9, 86 (2008).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: