Skip to main content

Analysis of DNA variations in GSTA and GSTMgene clusters based on the results of genome-wide data from three Russian populations taken as an example



Extensive genome-wide analyses of many human populations, using microarrays containing hundreds of thousands of single-nucleotide polymorphisms, have provided us with abundant information about global genomic diversity. However, these data can also be used to analyze local variability in individual genomic regions. In this study, we analyzed the variability in two genomic regions carrying the genes of the GSTA and GSTM subfamilies, located on different chromosomes.


Analysis of the polymorphisms in GSTA and GSTM gene clusters showed similarities in their allelic and haplotype diversities. These patterns were similar in three Russian populations and the CEU population of European origin. There were statistically significant differences in all the haploblocks of both the GSTM and GSTA regions when the Russian populations were compared with populations from China and Japan. Most haploblocks also differed between the Russians and Nigerians from Yoruba, but, some of them had similar allelic frequencies. Special attention was paid to SNP rs4986947 from the intron of the GSTA4 gene, which is represented in apes by an A nucleotide. In the Asian and African samples, it was represented only by a G allele, and both allelic variants (G/A) occurred in the Russian and European populations.


The results obtained suggest the presence of common features in the evolutionary histories of the GSTA and GSTM gene regions, and that African subpopulations were involved differently in the formation of the European and Asian human lineages.


The results of genome-wide analyses of different populations can be used to study the patterns of DNA diversity in particular genomic regions containing specific genes or gene clusters. One interesting and functionally significant genetic system includes the glutathione-S-transferase (GST) genes that encode the different GSTs.

The GSTs are one of the key groups of detoxification enzymes. The chemistry of the reactions catalyzed by these enzymes is based predominantly on the conjugation of glutathione to the electrophilic centers of various substances, which leads to a loss of toxicity and the formation of more hydrophilic products. The important noncatalytic functions of the GSTs include their capacity to sequester carcinogens, their involvement in the intracellular transport of a wide spectrum of hydrophobic ligands, and their modulation of signaling pathways [1, 2]. Like most other human genes, the genes encoding the GSTs are polymorphic. It has been suggested that these polymorphisms are functionally significant and that the frequencies of their allelic variants differ among human populations [3]. Until recently, only a limited number of GST polymorphisms had been studied (e.g., GSTM1 and GSTT1 gene deletions, a 3-bp deletion in GSTM3 intron 6, and SNPs in GSTP1 exons 5 and 6), and these were not sufficient to infer the genetic relationships of populations. It may be especially relevant that particular genes of some GST families are located close to each other, forming clusters in the genome [4]. However, with advances in the methods of genome analysis, including high-throughput genotyping technologies, it has become possible to obtain and use more detailed information about the polymorphisms in regions of interest. Recently, Polimanti and co-workers (2011) [5] compared polymorphisms of the soluble GST genes in some reference populations using the HapMap database. In the current study, we examined the polymorphisms in two genomic regions, comprising clusters of GSTA and GSTM genes, located on different chromosomes, in three groups of Russians from the western (Tver), eastern (Murom), and southern (Kursk) regions of the European part of Russia. The analyses were based on both the comparison of allelic variation in individual SNPs and the haplotype diversity across the GST clusters. The genotypes were obtained from a genome-wide analysis of SNPs [6, 7], performed with Illumina microarrays. To compare the Russian populations with other populations throughout the world, we also included four populations from the HapMap Project in this study: Utah USA residents with ancestry in northern and Western Europe (СEU), Han Chinese from Beijing (CHB), Japanese from Tokyo (JPT), and the Yoruba people of Ibadan, Nigeria (YRI). Their genotypes were downloaded from the HapMap Project site [8]. The data obtained showed high levels of similarity across the three Russian populations studied and between the Russian and CEU populations. However, the differences between them and the Asian and African populations were significant.


DNA samples were isolated from blood samples obtained with the informed consent of Russian donors from western (Andreapol district of the Tver region), eastern (Murom district of the Vladimir region), and southern locations (Kursk and Oktyabrsky districts of the Kursk region) in the European part of Russia. Their ethnicity was determined by interview. All individuals were unrelated and represented the native ethnic groups in the regions studied (i.e., they belonged to at least the third generation living in a particular geographic region). The DNA was isolated from peripheral leukocytes with standard techniques, using proteinase K treatment and phenol–chloroform extraction [9].

All the DNA samples were genotyped at the Estonian Biocentre (Tartu, Estonia), using the Illumina Human CNV370-Duo (Tver and Murom samples) and Human 660W-Quad chips (Kursk samples), according to the manufacturer’s instructions. In total, 288 Russian samples were genotyped (96 samples per population). Because the microarrays differed in the numbers of SNPs tested, the number of SNPs examined was standardized to obtain a set of loci that was consistent across all the populations analyzed. The set of loci was chosen by considering the chromosomal regions in which the GSTM and GSTA gene clusters are located. The sample sizes of the populations taken from the HapMap Project were: 165 individuals from CEU, 86 from CHB, 84 from JPT, and 166 from YRI.

The allele frequencies, their Hardy–Weinberg equilibrium status, and the SNP-based Wright’s fixation index (FST) [10] were calculated using the PowerMarker software package (v.3.0) [11]. The pairwise linkage disequilibrium statistic (D') [12] was estimated and the haplotypes were inferred for adjacent markers using an accelerated expectation-maximization algorithm embedded in the Haploview software [13]. The haplotype block patterns were defined using the block definition based on the linkage disequilibrium measure D' and its confidence interval. Linearized pairwise FST[14] values were used to evaluate the genetic affinities between populations. The significance level was set at P < 0.05.


Figure 1 shows 15 polymorphisms of the GSTA cluster, which is located at p12.1 of chromosome 6 over a 250-kbp area. The polymorphisms are presented according to their locations in relation to the genes. Based on the threshold value for the pairwise linkage disequilibrium between the SNPs (D' > 0.7) [15], six blocks were inferred in the GSTA cluster. All the haploblocks were identical in all the populations studied. Figure 1 shows the haploblocks for the Russian population from Tver. The corresponding data for the other two Russian populations were identical to the Tver data.

Figure 1
figure 1

SNPs studied in the GSTA gene cluster (e.g., the Tver population). The numbers inside the diamonds show the pairwise linkage disequilibrium (D) values.

Table 1 shows the allelic frequencies for all the polymorphic loci of the GSTA cluster in the Russian populations and in the HapMap populations. A comparative analysis showed no differences in the distributions of the SNP variants in the Russian populations, and similar allelic frequencies were found in the CEU population. However, the allelic distributions in the three remaining HapMap populations differed considerably from those in the populations of European descent.

Table 1 Minor allelic frequencies of polymorphisms in GSTA cluster

We also calculated the fixation indices (FST) to quantitatively assess the levels of interpopulation frequency variation. Figure 2 presents the multidimensional scaling of the matrix of linearized pairwise FST values. The diagram shows that the Russian populations form a single cluster, with the CEU population close to them. However, the African YRI population and Asian CHB and JPT populations are situated at a considerable distance from them.

Figure 2
figure 2

Two-dimensional scaling plot of the matrix of genetic distances between the Russian populations (Tver, Murom, and Kursk) and populations from the HapMap Project (CEU, CHB, JPT, and YRI).

Table 2 shows the frequencies of the haplotypes in each haploblock in all the populations analyzed. It is evident that different haploblocks contain different numbers of haplotypes. For instance, haploblock #1 has only two haplotypes, whereas haploblock #4 has four haplotypes. Some haploblocks, namely blocks #5 and #6 in the CHB and JPT populations and block #6 in the YRI population, were not inferred because the SNPs tested were monomorphic in these populations.

Table 2 Haplotype frequencies in six selected blocks inferred for the GSTA cluster*

The comparison of the populations was based on the haplotype frequencies calculated for each block. The calculated probabilities (P values) presented in Table 3 show the results of this comparison. The statistically significant levels of P were set for each block using the Bonferroni correction for multiple testing. The data generated showed no marked differences in the haplotype frequencies across the Russian populations and the CEU population. However, a comparison of the haplotype frequencies in the Russian populations with those in the Chinese, Japanese, and Nigerian populations indicated significant difference between them. Most P values were considerably lower than the specified levels. The exceptions were block #5 and, to a certain degree, blocks #1 and #4, where the pairwise P values for the pairs of Russian and Nigerian populations were higher than values specified for these blocks.

Table 3 Comparison of the haplotype frequencies in the GSTA gene cluster of the populations from Tver, Murom, and Kursk with those from the HapMap Project (CEU, CHB, JPT, and YRI) *

The GSTM gene cluster is located on chromosome 1 in the p13.3 region and accounts for 85 kbp. The 14 marker loci found within the cluster (Figure 3) are listed in Table 4. As in the GSTA cluster, similarities in the frequencies of the GSTM alleles were observed between the Russian populations and the CEU population. Different frequencies were observed for the samples from Asia (CHB and JPT) and Africa (YRI). The two-dimensional plot of FST-based distances was similar to the plot obtained for the GSTA cluster (data for the GSTM cluster are not shown).

Figure 3
figure 3

SNPs studied in the GSTM gene cluster (e.g., the Tver population).

Table 4 Minor allelic frequencies of the polymorphisms in the GSTM cluster

Table 5 shows the haplotype frequencies in the haploblocks of the GSTM cluster. As in the GSTA cluster, the numbers of haplotypes observed in the blocks differed. When we considered the P values for the pairwise comparisons (Table 6), there were no marked differences in the haplotype frequencies between the Russian populations from Tver, Murom, and Kursk, and the CEU population. However, statistically significant differences were observed in most comparisons of the Russian populations with the CHB, JPT, and YRI populations. The only exceptions were in block #1, where the P values for the pairwise comparisons of the haplotype frequencies of the Russian and Nigerian populations were much higher than the specified significance level.

Table 5 Haplotype frequencies in the three haploblocks of the GSTM cluster
Table 6 Comparisons of the haplotype diversity of GSTM cluster in populations of Tver, Murom, Kursk and HapMap populations (CEU, CHB, JPT, YRI) *


Extensive genome-wide analyses of many human populations, using microarrays containing hundreds of thousands of SNPs, have provided us with considerable information about global genomic diversity [17]. These data can also be used to analyze the variability in local genomic regions, marking the evolutionary trajectories for both the main human groups and local populations.

In this study, we analyzed the variability of two genomic regions containing the genes of the GSTA and GSTM subfamilies. Our work was based on genotype data obtained from a whole-genome analysis of SNP genotypes performed with Illumina microarrays in three Russian populations. We compared these data with corresponding data from several HapMap populations.

Although genes of GSTA and GSTM subfamilies are located on different chromosomes, our analysis of the polymorphisms in these two gene clusters showed similarities between them in terms of their patterns of allelic and haplotype frequencies across the populations examined. The haplotype spectra of the three Russian populations studied (from Tver, Murom, and Kursk), who share a common ethnic origin, were similar. No marked differences were also established between the three Russian populations and the CEU population, which clearly reflects their common European ancestry. In this context, it was interesting to find some similarity between the Russian samples and the Yoruba population from Nigeria in the haplotype frequencies of some blocks (mainly block #5 of the GSTA cluster and block #1 of the GSTM cluster). Because the European populations differed significantly from the populations of China and Japan in the haplotype spectra of all blocks in both clusters, we propose that these similarities can be attributable to some particular features of these haploblocks in the microevolutionary history of the populations. At the same time, the Russian and Nigerian populations differed significantly in the remaining haploblocks of both gene clusters.

Another interesting finding that warrants particular attention is SNP rs4986947 from block #6 of the GSTA cluster, located in the intron of the GSTA4 gene. In apes, this SNP site carries an A nucleotide [18]. In the populations analyzed from Asia and Africa, another nucleotide (G) occurred at this SNP site with a frequency of 100% (the same is also true for two other African HapMap samples—Luhya and Maasai) [19]. By contrast, in the European populations tested, including all populations from Russia, both alleles (G/A) are represented at this locus; i.e., the ancestral allele, containing A, is also present in these populations. Two possible explanations for this fact can be proposed. The first assumes substantial ancient gene flow (migrations) from Africa to the proto-West Eurasian (European) population after its divergence from the proto-East Eurasians [20]. These migrations could have included individuals with the ancestral A allele at rs4986947, which is virtually absent from the reference African populations. The second explanation is that the mutation could have been reversed in part of the European population, thus returning to its ancestral state. The persistence of the A allele in Europeans may be attributable to natural selection, which can shape the interethnic variation in the GST genes, as has been demonstrated by Polimanti et al. (2011) [5]. In addition to the Russian and CEU samples tested, the A allele at rs4986947 is also found at frequencies of around 6% in geographically distant European samples from Great Britain, Finland, and Italy [19]. These quite low frequencies may be the result of balancing selection.


In summary, we have reported the results of a study of SNPs in two genomic regions carrying the genes of the GSTA and GSTM subfamilies. By using a haplotype-based approach, we have demonstrated a similarity in the patterns of allelic diversity between the GSTA and GSTM gene clusters in all populations studied. This leads us to propose that the evolutionary histories of these clusters share many features and mark the same events in the evolutionary trajectories of the main human groups.


  1. Hayes JD, Flanagan JU, Joowsey IR: Glutation transferases. Annu Rev Pharmacol Toxicol. 2005, 45: 51-88. 10.1146/annurev.pharmtox.45.120403.095857.

    Article  CAS  PubMed  Google Scholar 

  2. Oakley A: Glutathione transferases: a structural perspective. Drug Metab Rev. 2011 , 43 (2): 138-151. 10.3109/03602532.2011.558093.

    Article  CAS  PubMed  Google Scholar 

  3. Garte S, Gaspari L, Alexandrie AK, et al: Metabolic gene polymorphism frequencies in control population. Cancer Epidemiol Biomarkers Prev. 2001, 10: 1239-1248.

    CAS  PubMed  Google Scholar 

  4. Frova C: Glutatione transferases in the genomics era: new insights and perspectives. Biomol Eng. 2006, 23: 149-169. 10.1016/j.bioeng.2006.05.020.

    Article  CAS  PubMed  Google Scholar 

  5. Polimanti R, Piacentini S, Fuciarelli M: HapMap-based study of human soluble glutathione S-transferase enzymes: the role of natural selection in shaping the single nucleotide polymorphism diversity of xenobiotic-metabolizing genes. Pharmacogenet Genomics. 2011, 10: 665-672.

    Article  Google Scholar 

  6. Nelis M, Esko T, Magi R, et al: Genetic structure of Europeans: a view from the north–east. PLoS One. 2009, 4: e5472-10.1371/journal.pone.0005472.

    Article  PubMed Central  PubMed  Google Scholar 

  7. Limborska S, Khrunin A, Filippova I, Khokhrin D, Bebyakova N, Bolotova N, Esko E, Metspalu A: Abstracts of papers presented at the 2012 meeting on the “biology of genomes”. Genomic variations in populations from the far north east corner of Europe. 2012, Cold Spring Harbor Laboratory, New York, 167-

    Google Scholar 

  8. International HapMap project.,

  9. Milligan BG: Total DNA isolation. Molecular genetic analysis of populations. Edited by: Hoelzel AR. 1998, Oxford University Press, London, 29-60.

    Google Scholar 

  10. Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution. 1984, 38: 1358-1370. 10.2307/2408641.

    Article  Google Scholar 

  11. Liu K, Muse SV: PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics. 2005, 21: 2128-2129. 10.1093/bioinformatics/bti282.

    Article  CAS  PubMed  Google Scholar 

  12. Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21: 263-265. 10.1093/bioinformatics/bth457.

    Article  CAS  PubMed  Google Scholar 

  13. Lewontin RC: On measures of gametic disequilibrium. Genetics. 1988, 20: 849-852.

    Google Scholar 

  14. Reynolds J, Weir BS, Cockerham CC: Estimation of the co ancestry coefficient: basis for a short-term genetic distance. Genetics. 1983, 105: 767-779.

    PubMed Central  CAS  PubMed  Google Scholar 

  15. Khrunin A, Mihailov E, Nikopensius T, Krjutškov K, Limborska S, Metspalu A: Analysis of allele and haplotype diversity across 25 genomic regions in three eastern European populations. Hum Hered. 2009, 68: 35-44. 10.1159/000210447.

    Article  CAS  PubMed  Google Scholar 

  16. Glantz S: Primer of BIOSTATISTICS. 1999, Praktica, Moscow

    Google Scholar 

  17. Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, et al: Measuring European population stratification with microarray genotype data. Am J Hum Genet. 2007, 80: 948-956. 10.1086/513477.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Ensemble.,

  19. 1000 Genomes. A deep catalog of human genetic variation.,

  20. McEvoy BP, Powell JE, Goddard ME, Visscher PM: Human population dispersal “Out of Africa” estimated from linkage disequilibrium and allele frequencies of SNPs. Genome Res. 2011, 6: 821-829.

    Article  Google Scholar 

Download references


This study was supported by Federal Program “Scientific and Pedagogical Cadre for Innovative Russia” for 2009–2013 years, programs of the Russian Academy of Sciences “Molecular and Cell Biology” and “Fundamental Science for Medicine”, Program of Support for Leading Scientific Schools of the Ministry of Education and Science of Russia and Russian Foundation for Basic Research.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Irina N Filippova.

Additional information

Competing interests

The authors declare no competing interests.

Authors’ contributions

INF carried out the polymorphism typing, performed the statistical analysis and drafted the manuscript. AVK participated in the study design, helped with the statistical analysis and manuscript drafting. SAL conceived of the study, participated in its coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Filippova, I.N., Khrunin, A.V. & Limborska, S.A. Analysis of DNA variations in GSTA and GSTMgene clusters based on the results of genome-wide data from three Russian populations taken as an example. BMC Genet 13, 89 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: