Here, we present the results of one of the largest association studies of skin pigmentation in recently admixed populations. Our meta-analysis includes over 2,000 unrelated admixed individuals from Cuba, Cape Verde, Puerto Rico and African Americans from the San Francisco Bay area. Although there are significant associations of African ancestry with skin pigmentation in all samples (Additional file 1: Figures S4-S7), there are considerable differences in the strength of the association, with R2 values ranging from 0.645 in the Cuban sample to 0.182 in the Puerto Rican sample. This is probably driven by differences in the population structure present in the samples. For the meta-analysis, we used summary data obtained after controlling for population structure (i.e. individual ancestry) and other relevant covariates in each sample.
The strongest signals observed in the meta-analysis correspond to SNPs in the well-known SLC24A5 and SLC45A2 genes. In both genes, the best candidates to explain the association are two non-synonymous variants: rs1426654 (SLC24A5) and rs16891982 (SLC45A2). The polymorphism rs16891982 is in very strong LD (r2 > 0.8) with another SNP in the SLC45A2 gene, rs35397, which is the lead signal in our meta-analysis. The SNP rs16891982 is predicted to be damaging by both SIFT and Polyphen, so its effect may be mediated by structural changes in the protein coded by the SLC45A2 gene, the membrane associated transporter protein (MATP), which works as a membrane transporter in melanosomes. Additionally, our eQTL analyses indicate that rs16891982 is associated with expression of the SLC45A2 gene (p = 4.78 × 10− 7), although we cannot exclude the possibility that this association is driven by potential LD with other variants not genotyped in this region. The derived G allele, which is associated with lower melanin levels, is associated with an increased expression of the SLC45A2 gene. SLC24A5 rs1426654 and SLC45A2 rs16891982 are the variants with the strongest effect on skin pigmentation described in human populations and have been reported in numerous association studies [4,5,6,7,8,9,10,11, 15,16,17]. Further, SLC45A2 rs16891982 has also been associated with tanning response and skin cancer in several studies [15, 18,19,20].
Genome-wide association signals were also identified in a region encompassing the genes GRM5 and TYR on chromosome 11. Another clear candidate to explain this association is the non-synonymous rs1042602 located in the TYR gene (p = 9.2 × 10− 10), which is in strong LD (r2 > 0.70) with our lead SNP (rs10160510, p = 3.36 × 10− 10) and it is predicted to be damaging by both SIFT and Polyphen. Enhancer histone marks, H3K4me1 and H3K27ac, have been identified in foreskin melanocyte primary cells overlapping two variants that are in high LD with rs1042602: rs12285584 and rs12295166. Two GWAS have previously reported associations of rs1042602 with pigmentary traits, such as skin pigmentation [21] and freckles [22]. Importantly, there seem to be more than one independent signal in the GRM5/TYR region. When carrying out association tests in the Cape Verde sample conditioning for rs1042602, or rs10160510, there are several genome-wide significant markers in the GRM5 gene that remain strongly associated with skin pigmentation (e.g. rs492312, p-value conditioning for rs1042602 = 3.82 × 10− 6, p-value conditioning for rs10160510 = 1.65 × 10− 4). The SNP rs492312 is in strong LD (r2 > 0.90) with markers that overlap with enhancer histone marks and/or DNase hypersensitive sites in foreskin fibroblast primary skin cells [23]. We also found support for the role of variants within the GRM5 gene in the follow-up of our genome-wide significant signals in the East African sample (Additional file 2: Table S6). Many of these variants, although found at very low frequencies in the East African sample (less than 2%), are nominally associated with skin pigmentation, and the estimated effects on skin pigmentation are very large (e.g. rs35790407, MAF = 1.1%, p = 3.87 × 10− 3, beta = − 5.501; rs11021131, MAF = 1.6%, p = 0.012, beta = − 3.957). We hypothesize that the association of variants located on the GRM5 gene with pigmentation may be due to their regulation of expression of the TYR gene, which is a critical gene in melanogenesis.
We also observed strong associations of variants located on chromosome 19, near the MFSD12 gene, with skin pigmentation. The role of this gene on skin pigmentation has only been recently described. Crawford et al. [10] reported variants associated with skin pigmentation in two different regions on chromosome 19, one within and another upstream of the MFSD12 locus, in an East African sample. Additionally, they showed, using experiments in mouse melanocyte cultures and zebrafish and mice animal models, that MFSD12 has a functional role on pigmentation. Of note, one of the SNPs identified in our meta-analysis, rs112332856 (2.31 × 10− 8), is one of the genome-wide variants originally reported by [10]. This SNP is associated with expression of the MFSD12 gene. In particular, the rs112332856 C allele, which is associated with darker skin pigmentation, is associated with decreased expression of MFSD12, and this is consistent with experiments in mouse melanocyte culture that indicate that depletion of MFSD12 increases eumelanin biogenesis in melanocytes.
Additionally, we observed multiple genome-wide signals in a broad region of chromosome 15 overlapping the genes OCA2/HERC2/APBA2. A cluster of variants is located upstream of the OCA2 gene (lead SNP rs3098576, p = 8.9 × 10− 9) and the markers in this region are not in strong LD with other genome-wide signals identified in the OCA2/HERC2/APBA2 region. The posterior probabilities of having an effect (M-values) are high in all the samples of the meta-analysis, and there is no evidence of heterogeneity in effect sizes. However, we could not replicate these results in the East African sample, in spite of the fact that the relevant variants are present at high frequencies in the sample. It will be necessary to carry out additional studies to elucidate the potential role of this region on skin pigmentation variation.
Aside from the aforementioned region, we identified genome-wide clusters in a segment of approximately 1 Megabase that includes markers located within the OCA2 gene, the HERC2 gene and within or nearby the APBA2 and FAM189A1 genes. These clusters are located more than 250 Kb apart, but the markers in different clusters are in LD (D’ > 0.8; Additional file 1: Figures S10-S17). Identifying the putative causal variants in this region is challenging for two reasons: the aforementioned LD between markers, and problems with imputation of variants between the HERC2 and APBA2 region, which is probably due to the presence of segmental duplications in this genomic region [24]. For example, some of the genome-wide signals identified in our meta-analysis were not present in the imputed data used to evaluate gene expression in melanocytes, or in the imputed East African dataset.
In spite of these challenges, there are several functional candidates in the OCA2/HERC2/APBA2 region. Within OCA2, our lead SNP is rs1448484. This variant has been identified in studies of signatures of selection as one of the markers with the highest continental differentiation in human populations [25, 26]. The derived G allele, which is strongly associated with light skin pigmentation, is almost fixed in non-African populations, whereas the ancestral A allele is the most common allele in African populations. This polymorphism has been associated with skin pigmentation in a previous study [27]. Additionally, this marker and linked OCA2 polymorphisms overlap with enhancer histone marks and DNase hypersensitive sites in foreskin fibroblast primary skin cells [23]. In the recent GWA study in East Africa, the main signal in the OCA2 region was the synonymous SNP rs1800404. However, this SNP is not in very strong LD with rs1448484 (r2 < 0.60), and is not genome-wide significant in our meta-analysis (rs1800404 p = 8.92 × 10− 6 vs. rs1448484 p = 3.73 × 10− 10). The high M-values observed in the meta-analysis indicate that the probabilities of having an effect are high for rs1448484 in the four samples analyzed. It would be important to carry out additional studies in substantially larger admixed and African samples in order to explore the reasons for the different association patterns identified in the OCA2 region.
Within HERC2, our lead signal rs1667392 (p = 4.64 × 10− 9) is in strong LD with rs12913832 (p = 3.18 × 10− 8), an intronic SNP that has a known regulatory effect on OCA2 expression by disrupting the interaction between an enhancer located on HERC2 and the OCA2 promoter [28, 29]. This SNP is strongly associated with blue eye color [28,29,30,31,32,33]. Our melanocyte eQTL data fully supports the functional effects described for rs12913832; the derived G allele, which is associated with light pigmentation in our meta-analysis, shows a strong association with reduced OCA2 expression (p = 3.14 × 10− 23). Of note, in addition to the extensively reported association with blue eye color, rs12913832 has been associated with skin pigmentation and hair color in previous studies [15, 22, 34,35,36,37].
We also observed genome-wide significant signals downstream of HERC2, near the APBA2 gene. This region was previously reported in the original GWAS in the Cape Verde sample [6], although the top signals reported in that study are not the same as those identified in our meta-analysis. Interestingly, the lead SNP in the meta-analysis (rs36194177, p = 4.95 × 10− 10), and variants in LD with this SNP (rs142415892 and rs148942115), overlap with enhancer histone marks and DNase hypersensitive sites in several tissues, and there is an enrichment for enhancers in foreskin keratinocyte primary cells in skin [23]. Unfortunately, none of our genome-wide signals in the APBA2 region was present in the imputed data used to evaluate gene expression in melanocyte cultures, so we could not evaluate potential associations of these variants with gene expression of relevant genes, and more particularly, OCA2. However, three of the variants were present in the East African imputed dataset (rs142415892, rs148942115, and rs150641715), and two of them are nominally associated with skin pigmentation (e.g. rs150641715, p = 0.017).
The final genome-wide signals identified in this region are located within the FAM189A1 gene (lead SNP rs2636060, p = 5.7 × 10− 10). To our knowledge, this region has not been described in previous pigmentation association studies. The posterior probabilities of having an effect (M-values) are high in all the samples of the meta-analysis, and there is no evidence of heterogeneity in effect sizes, as observed for the region located upstream of OCA2. However, the two genome-wide signals are not significant in the East African sample, and there is no overlap of this region with putative regulatory regions active in skin cells [23].
Given the pattern of LD observed in the OCA2/HERC2/APBA2 region, we carried out additional conditional and haplotype tests to evaluate if the signals observed in the meta-analysis may be due to a single causal SNP, or to independent effects. These tests point to the presence of independent variants. When conditioning for each of the lead SNPs from the four relevant clusters, rs1448484 (OCA2), rs1667392 (HERC2), rs36194177 (near APBA2) and rs2636060 (FAM189A1), the p-values of the other markers remain nominally significant. The haplotype tests also support at least partially independent effects. As expected, the haplotype associated with the strongest reduction in melanin levels is the haplotype harboring the four variants common in European populations (ACGA). However, there are other haplotypes that do not include some of the variants common in Europe that are also associated with lower melanin levels, and the haplotype patterns are quite similar in Cape Verde and Cuba. Controlling for the haplotype ACGA in the Cape Verdean sample, there is still a marginal association with skin pigmentation, which is driven by the haplotypes GGGA and AGGA.
Given that some variants that have an effect on constitutive pigmentation have been found to be associated with facultative pigmentation (i.e. tanning ability) [14, 18], we followed-up in our meta-analysis the genome-wide significant markers identified in a recent tanning response GWAS [14]. These results are depicted in Additional file 2: Table S7. One of the novel markers identified in the aforementioned GWAS, located nearby EMX2 on chromosome 10 is nominally significant in our meta-analysis of constitutive pigmentation (rs35563099, p-value = 0.002). EMX2 is known to be involved in the synthesis of melanin, and it has been shown that overexpression of the Emx2 homologous gene causes loss of pigmentation by downregulating the Mitf gene in mice [38]. Our study indicates that this gene is involved both in tanning response and constitutive skin pigmentation.
When our manuscript was under review, an article was published describing the results of a GWAS of skin and iris pigmentation in a sample of more than 6,000 Latin Americans [9]. Some of the findings of this study are very similar to those described in the present study. More particularly, the authors also reported multiple independent signals within the GRM5/TYR and OCA2/HERC2 regions. In the GRM5/TYR region, Adhikari et al. [9] reported three signals: rs7118677 (GRM5), rs1042602 (TYR) and rs1126809 (TYR). In the OCA2/HERC2 region, they reported five signals: rs4778219 (OCA2), rs1800407 (OCA2), rs1800404 (OCA2), rs12913832 (HERC2) and rs4778249 (HERC2). We followed up all the genome-wide signals reported by Adhikari et al. [9] in our meta-analysis (Additional file 2: Table S8). Several of these signals are also the lead signals identified in our study (e.g. rs1042602 in TYR, rs12913832 at HERC2), clearly pointing to the same causal variants. However, other lead variants are different to our lead SNPs in the relevant regions (Table 2; Additional file 2: Table S8), and Adhikari et al. [9] do not report genome-wide significant variants within the APBA2 region. Some of these discrepancies may be driven by the demographic differences between both datasets. The GWAS on Latin Americans [9] includes samples from Chile, Mexico, and Peru, which have primarily Native American and European ancestry, with very small African ancestral contributions (< 5%), and samples from Brazil and Colombia, which have primarily European ancestry, with substantially lower Native American (12.1 and 29.2%, respectively) and African contributions (~ 10% in both samples). In contrast, the samples included in our study are characterized by a wide distribution of West African and European ancestry, without (e.g. Cape Verde) or with very small Native American ancestry (Cuba, Puerto Rico, African Americans).
Interestingly, Adhikari et al. [9] also reported a non-synonymous genome-wide signal within the gene MFSD12: rs2240751. However, this missense variant is common only in East Asian and Native American populations, and is different from the variants reported in the recent GWAS in East African populations [10] and those we found in the present study, which reflects the different demographic composition of the samples. Clearly, there have been independent mutational events within this gene in African and East Asian/Native American groups.
In summary, in this meta-analysis we have identified multiple regions significantly associated with skin pigmentation variation in admixed populations. All of these regions have been reported before. However, our analyses indicate that in some of these regions there are multiple independent signals. We have not identified any novel pigmentation regions in our study. This may be related to limited power to identify variants with moderate effects on skin pigmentation. Further meta-analytic efforts including larger sample sizes will be needed to uncover this type of variants, and to elucidate the reasons driving the heterogeneity in effect sizes observed for some of the signals identified in this study.