Performance of HLA allele prediction methods in African Americans for class II genes HLA-DRB1, −DQB1, and –DPB1

Background The expense of human leukocyte antigen (HLA) allele genotyping has motivated the development of imputation methods that use dense single nucleotide polymorphism (SNP) genotype data and the region’s haplotype structure, but the performance of these methods in admixed populations (such as African Americans) has not been adequately evaluated. We compared genotype-based—derived from both genome-wide genotyping and targeted sequencing—imputation results to existing allele data for HLA–DRB1, −DQB1, and –DPB1. Results In European Americans, the newly-developed HLA Genotype Imputation with Attribute Bagging (HIBAG) method outperformed HLA*IMP:02. In African Americans, HLA*IMP:02 performed marginally better than HIBAG pre-built models, but HIBAG models constructed using a portion of our African American sample with both SNP genotyping and four-digit HLA class II allele typing had consistently higher accuracy than HLA*IMP:02. However, HIBAG was significantly less accurate in individuals heterozygous for local ancestry (p ≤0.04). Accuracy improved in models with equal numbers of African and European chromosomes. Variants added by targeted sequencing and SNP imputation further improved both imputation accuracy and the proportion of high quality calls. Conclusion Combining the HIBAG approach with local ancestry and dense variant data can produce highly-accurate HLA class II allele imputation in African Americans.


Background
The human leukocyte antigen (HLA) region resides within the major histocompatibility complex (MHC) on chromosome 6p21.31 and contains multiple genes encoding highly variable antigen-presenting proteins that play a key role in immunity [1]. Among these, the class I genes HLA-A, −B, and -C, and the class II genes HLA-DRB1, −DQB1, −DQA1, and -DPB1 are the most frequently studied, Decades of HLA research have revealed that genetic variation in these genes play important roles in disease susceptibility and pharmacogenetic interactions that influence the efficacy of certain drugs.
The nomenclature developed to catalogue the allelic variation in HLA genes has evolved over time to incorporate a growing number alleles identified in each gene. In 1987, a four-digit code (e.g. HLA-DRB1*0401) was first employed to catalogue alleles that differed in protein sequence [2]. The first two digits correspond to the protein serotypes distinguishable by serological reagents used before polymerase-chain reaction-based methods were available [3]. Coupled with the first two digits, the second two specify a unique amino acid sequence or, equivalently, haplotypes of non-synonymous (protein-altering) polymorphisms within each gene. Subsequently, a colon was introduced into the notation (e.g. HLA-DRB1*04:01) to separate the digits into fields (e.g. first field corresponding to serotype and the second to differences in haplotypes of nonsynonomous polymorphisms) to accommodate an everincreasing number of alleles [4]. While additional fields have been added to delineate alleles that differ in genetic variation that does not alter the protein amino acid sequence, the functional two-field alleles remain the primary focus in basic research and clinical applications.
In particular, prior to the advent of genome-wide genotyping and sequencing technologies, typing of the two-field alleles led to breakthroughs in our understanding of the role of HLA genes in the genetic architecture of multiple immune-mediated diseases [5], which have been replicated by genome-wide association studies (GWAS). However, few GWAS have been followed up by direct HLA allele typing to dissect the potentially multiple causal variants driving the observed associations at single nucleotide polymorphisms (SNPs), partially due to the high cost of genotyping the HLA alleles via sequence-specific oligonucleotide primers in GWAS of thousands of individuals.
As a result, imputation of HLA alleles using large reference panels-such as those assembled by the International HapMap consortium or the 1000 Genomes (1KG) project-has grown more common. While initial studies have shown that one or more SNPs may be used to "tag" common HLA alleles (allele frequency ≥0.05) within ancestral population groups [6], many of the HLA alleles are rare (allele frequency <0.05) in a given population and may not be reliably tagged by sets of two or three SNPs. Methods that address this challenge using dense SNP genotyping and known linkage disequilibrium and haplotype structure [7][8][9][10][11][12][13] have led to breakthroughs in identifying causal variants within HLA, including recent successes in rheumatoid arthritis [14] and multiple sclerosis [15]. While these studies demonstrated the power and efficiency of HLA imputation, use of these methods has generally been confined to samples of primarily European ancestry.
The extension of imputation methods to admixed populations is critical for mapping HLA-dependent diseases that differ in incidence between ancestral populations. The recently-developed HLA Genotype Imputation with Attribute Bagging (HIBAG) [16] and HLA*IMP:02 [13] methods are the first imputation methods to address HLA allele imputation in admixed populations, but these methods are new and include a limited number of admixed individuals in the published models. These methods also differ from one another, with HIBAG employing multiple expectation-maximization-based classifiers to estimate the likelihood of HLA alleles and HLA*IMP:02 using a haplotype graph-based approach. They are similar in that they allow researchers with GWAS genotyping but without HLA allele data from appropriate reference samples an option to impute the alleles in their own subjects. In the current study, we used genome-wide genotyping [17] and HLA allele data [18] from our previous studies of sarcoidosis to compare the imputation accuracy of these methods in both African American and European American individuals. We also investigated whether HIBAG models using these data improved upon existing model predictions for African Americans and evaluated the impact of local ancestry information on imputation accuracy in admixed subjects [19,20]. Finally, we determined the influence of increasing the SNP density through adding variation from SNP imputation and targeted sequencing on the imputation accuracy of HLA alleles.

Results
Our sample comprising 2,727 African Americans (1,271 cases, 1,456 controls) and 2,726 European Americans (442 cases, 2,284 controls) has been described previously [17,21]. The African American samples were assembled from the following studies: 1) a case-control etiologic study of sarcoidosis (ACCESS) [22]; 2) a multi-site affected-sibling sarcoidosis linkage study [23]; 3) a nuclear family-based sample ascertained through a single affected individual within the Henry Ford Health System in Detroit, Michigan [24]; and 4) healthy controls from the Oklahoma Medical Research Foundation's Lupus Family Registry and Repository [25]. European American sarcoidosis cases were derived from both the ACCESS and Henry Ford samples. Low-to intermediateresolution HLA genotype data were available for a subset of subjects from the ACCESS study [22]: 325 African Americans (156 cases, 169 controls) and 480 European Americans (239 cases, 241 controls).
The published HIBAG models were applied to the sample of ACCESS European Americans (n = 480) with available HLA-typing and genome-wide genotyping to validate the one-and two-field allele classification accuracy. The overall imputation accuracy results for the HLA-DRB1, −DQB1, and -DPB1 in European Americans are presented in Table 1; the allele-specific model performance measures (imputation accuracy, sensitivity, specificity, positive predictive value, and negative predictive value) are presented in Additional file 1. Imputation accuracy was high (>90%) at both the one-and two-field resolution across all three genes. Removal of subjects with posterior predicted genotype probabilities ≤0.5-a threshold calibrated by the developers of HIBAG-reduced the sample size (8.3% reduction for -DRB1, 0.8% for -DQB1, and 2.3% for -DPB1) but resulted in slightly improved classification rates. Compared to HLA*IMP:02, HIBAG had higher imputation accuracy at both one-and twofield resolution ( Table 1) for all three genes, with the exception HLA-DQB1 at two-field resolution, where the two methods produced comparable results. The largest difference was observed for HLA-DPB1, with 10.2% and 10.6% higher accuracy rates at the one-and two-field resolutions, respectively.
In comparison to European Americans, classification accuracies were lower for African Americans (Table 2), and HIBAG published models were overall less accurate than HLA*IMP:02. Using the published HIBAG African ancestry models, imputation accuracy rates ranged from 69%-96% in the absence of a posterior predictive probability threshold (Table 2); the allele-specific performance measures are presented in Additional file 2. When the 0.5 threshold was applied, two-field resolution accuracy increased 5.3-19.8%. Compared to European Americans, the call rates for African American subjects that exceeded this threshold was substantially lower (47.7-65.8%). In contrast, HLA*IMP:02 African American call rates at the same threshold were higher (minimum value of 82.5%).
Next, we constructed HIBAG models using the ACCESS African American sample as reference to analyze the imputation accuracy and quality of genespecific prediction models (Table 2); the corresponding allele-specific performance measures are presented in Additional file 3. Samples not used for training were used to estimate the imputation accuracy of models (i.e. the test set). These models performed well (accuracy >86%) with 10.8%-21.8% higher imputation accuracy than published HIBAG African ancestry and HLA*IMP:02 models at twofield resolution; the ACCESS HIBAG models outperformed BEAGLE at HLA-DRB1 and -DPB1, with comparable accuracy achieved for HLA-DQB1. Figure 1 displays the plots of ACCESS HIBAG allele sensitivity (i.e. proportion of a particular allele accurately predicted) by allele frequency for each of the three genes. For alleles with a frequency ≥ 1%, the median (interquartile range) of sensitivity were 98.1% (86.5%-100.0%), 98.3% (91.7%-100.0%), and 96.8% (73.3%-99.1%) for HLA-DRB1, −DQB1, and -DPB1, respectively. Also, A difference in the quality of the predictions was found, with 68.9% (−DRB1), 88.3% (−DQB1), and 58.8% (−DPB1) exceeding the posterior probability threshold of 0.5. Above this threshold, two-field resolution imputation accuracies were ≥ 96% across all genes.
To determine the relationship between posterior probability and imputation accuracy in the African American sample, we analyzed two-field accuracy estimates by HIBAG posterior probability thresholds (Table 3). For all three genes, imputation accuracy estimates exceeded 90% at a posterior probability threshold >30%, suggesting that this threshold is associated with high imputation accuracy levels in African Americans.
The full sample of 325 African American subjects from ACCESS with both HLA-typing and genome-wide genotyping was also used for model training. Applying these models to the remaining 2,402 African Americans with genome-wide genotyping but without HLA typing, the proportion of samples with posterior prediction probabilities > 0.5 increased to 85.7% (−DRB1), 93.8% (−DQB1), and 90.3% (−DPB1), approaching the results seen in European Americans (Table 1).
Variable local ancestry may also impact HLA imputation accuracy in African Americans. Table 4 displays the AC-CESS HIBAG imputation accuracy estimates by local ancestry status at each gene. Fisher's exact tests indicate that differences in accuracy by local ancestry were evident at each of the genes at two-field resolution (p-values ≤ 0.04); accuracy was consistently 5.2-14.2% lower for heterozygous individuals (those with one African-derived DNA segment and one European-derived segment) compared to those homozygous for the West African ancestral haplotype. To determine whether these differences could be reduced, HIBAG models were trained on a mixed sample of 150 ACCESS European Americans (i.e. 300 European haplotypes) and 150 ACCESS African American with two West African alleles (i.e. 300 West African haplotypes) and tested on the remaining 175 ACCESS African Americans not included in the training sample (Table 5); the corresponding ‡Oneand two-field estimates will be very similar for HLA-DPB1 as the first field uniquely identifies the two-field alleles, with the exceptions of *02:01 and *0202, and *04:01 and *04:02.
allele-specific performance measures for this test set are presented in Additional file 4. Using these mixed-ethnicity models, there were no longer statistically significant differences in two-field classification accuracy by local ancestry status (p > 0.1) at any of the three genes. Finally, to determine the benefit of adding more genetic variants via imputation prior to model construction and during HIBAG imputation, we compared ACCESS HIBAG results ( Table 2) against two different imputation strategies: ACCESS observed plus 1KG-imputed data; and ACCESS observed plus 1KG and targeted sequencing-imputed data ( Table 6). These models performed well (imputation accuracy > 90%) at two-field resolution across all genes. When the suggested posterior probability threshold of 0.5 was applied, we observed better call rates while maintaining high imputation accuracies when imputing more variants for HLA-DRB1 (6.1%-21.6% improved), -DQB1 (1.1%-10.6% improved), and -DPB1 (1.8%-5.4% improved). These results highlight the contribution of increased SNP density to overall imputation quality and accuracy. Because the sequencing region captured only HLA-DRB1 and -DQB1, we were not able to test the accuracy of a model using both the 1KG and targeted sequencing reference panels for HLA-DPB1.

Discussion
Data on HLA alleles are essential for understanding causal variation that underlies SNP associations found in  GWAS of diseases with a strong HLA component. Despite reductions in the cost of genome-wide genotyping in GWAS, genotyping HLA alleles remains expensive, although less expensive next-generation sequencing methods now exist [26,27]. The cost of HLA allele typing in large samples has spurred the development of methods using data on common variants from GWAS genotyping, as many existing studies already possess this data. Such methods can be a low or no-cost option for studies with existing data.
Our results for European Americans validate the high one-and two-field accuracy rates reported for the validated HLA*IMP:02 method [7,8]. In our sample, the HIBAG results are similar or better. HIBAG also performed well in an African American sample; reductions in overall imputation accuracies compared to those from European American samples are partially due to sample sizes in the training set. When two-field allele sensitivity estimates are compared by allele frequency, our findings suggest that rare alleles are more susceptible to poor imputation, even given large reference panels. These findings agree with those of Leslie et al. [7] that showed the sensitivity of two-field allele prediction was related to occurrences of the allele in the model training data. Further, in smaller reference panels -such as those currently available for admixed populations-even relatively common alleles may be poorly imputed. While these findings suggest that HLA allele imputation accuracy in admixed populations could benefit from increasing the number of reference haplotypes, our results also suggest additional causes of low imputation accuracy in African Americans.
When we used equivalent samples sizes and compared models constructed on the sample of African Americans from ACCESS to those from published HIBAG African ancestry models, we found consistent increases in Figure 1 African American allele prediction sensitivity for two-field HLA class II genes by allele frequency using the ACCESS HIBAG models. Sensitivity is equal to the probability that the predicted allele matches the actual genotyped allele prediction (i.e. true positive/(true positive + false negative)).  imputation quality and accuracy from the ACCESS models. One possible reason is SNP density in the models. The HIBAG models were constructed using a subset of SNPs overlapping across three different Illumina GWAS platforms [16], whereas our models included all the SNPs from only one platform. The improvement in accuracy related to the density of SNPs and completeness of genotyping (i.e. decreasing levels of missing calls for genotyped SNPs following imputation) is also supported by our results from different imputation strategies. Another source of improvement may be related to complexity in the ancestral origins of individuals included in the training samples. For example, the training samples included in published HIBAG African ancestry models may encompass sub-populations that differ from ACCESS African Americans in their HLA allele frequency spectrum. In addition to African Americans and Yoruba individuals who were part of the International HapMap project and originally genotyped for the HLA alleles by de Bakker et al. [6], the HIBAG African ancestry sample included individuals from South Africa. This population is not thought to have contributed substantively to the genomes of present day African Americans and may not be informative for HLA imputation in this population. Further, our findings suggest that consideration of local ancestry can aid in the improvement of HLA allele imputation accuracy in admixed populations, as training-set results for individuals heterozygous for local West African ancestry were inferior to those for homozygous individuals. An informed sampling of ancestral haplotypes may be necessary to produce high-quality predictions in the admixed population of interest.
Based on our results that suggest denser SNP genotyping may be related to improved imputation accuracy, these findings suggest that accuracy could be improved in admixed populations through direct HLA allele genotyping in a large, geographically-diverse reference sample of individuals with complete sequencing of the broader MHC region, such as the 1KP [28]. Use of the near-complete catalogue of SNPs in the model building process would eliminate the need to account for genotyping platform.
Genetic association studies are one of the principal applications of GWAS SNP-based HLA imputation; in this setting, accuracy and quality of imputation is directly related to power. When applying a posterior probability threshold of 0.5, we found that imputing more variants in the training set improved call rates (and thus the sample size) while maintaining high imputation accuracy. The improvements in call rate were most dramatic for the strategy that included targeted sequencing for HLA-DRB1 and -DQB1, which is likely the result of direct genotyping and subsequent imputation of the nonsynonymous polymorphisms that define the HLA alleles ‡Test for difference in imputation accuracy proportions across individuals with 0, 1, or 2 West African Alleles at locus in question. Abbreviations: N denotes, count; Accuracy, imputation accuracy; P, p-value from a Fishers exact test. *150 individuals (300 chromosomes) from African American with two ancestral West African alleles at each gene as estimated by LAMP and 150 individuals from the ACCESS European American sample (i.e. 300 European chromosomes). The test set for these models included the remaining 175 African Americans who were not a part of the 150 used to build the model. †Number of West African ancestral alleles at each gene estimated by the Local Ancestry in Admixed Populations (LAMP) method.
‡Test for difference in imputation accuracy proportions across individuals with 0, 1, or 2 West African alleles at each gene.
at two-field resolutions. While the gains in call rate may seem negligible, when prediction is used to test the association of these alleles with a trait of interest, even modest increases in sample size may dramatically impact statistical power.
For applications other than genetic association mapping, additional metrics may be more appropriate. Clinical pharmacogenetic applications, such as the identification of patients likely to experience HLA-associated adverse drug reactions (e.g. abacavir hypersensitivity in carriers of HLA-B*57:01 [29]), may benefit from a metric that can account for uncertainty in predictions as well as differentiate between correct and incorrect classification. In such cases, the generalized Bayesian information reward applied to machine learning classification methods [30] may be a solution. This method compares the natural logarithm likelihood of the model (based on posterior probability of the observed genotypes estimated from HIBAG) to a null model (expected genotype frequencies in the population, assuming Hardy-Weinberg Equilibrium). For our purposes, however, imputation accuracy is a valid method for evaluating the relative strengths and weaknesses of different imputation modeling strategies.
HLA allele prediction in non-European populations has not been widely reported on in the literature, likely due to lack of methods and references panels. Recently published allele prediction results for HLA-DRB1 and -DQB1 in the Wolita population of southern Ethiopia [31] used a multi-allelic prediction model [11] that was accurate at the one-field level (>85%) but less so at the two-field level (<32%). These models were constructed using just 10 and 19 SNPs for -DRB1 and -DQB1 respectively, which may explain the low two-field accuracy [32]. Such results demonstrate the need for larger reference panels of HLA alleles and dense SNP genotyping in the HLA region for non-European populations.
The HIBAG approach has several practical advantages over other established HLA imputation methods. First, researchers can build models for prediction using their own samples, particularly in non-European populations for whom a reference panel has not been established. In African Americans, ancestral contributions are primarily of West African and European origin; admixed populations with greater ancestral heterogeneity (e.g. Latinos [33][34][35][36][37][38]) likely require additional population-specific reference panels to improve imputation accuracy. Further, HIBAG uses the open-source R statistical programming language. Models produced by researchers can be shared without proprietary software or the transfer of protected health information such as individual genotype data. The African American HLA class II imputation models produced for this study are available on request.
A limitation of this study is our use of HLA typing data that is over a decade old as our gold standard to estimate imputation accuracy. We recognize these data are incomplete in terms of the current compendium of HLA class II alleles, but it should be noted that even today's high-resolution HLA typing results in ambiguous allele and genotype calls (i.e. multiple distinct alleles consistent with the same raw sequence). While the undetected or misclassified alleles in our HLA typing are not strictly quantifiable, our estimates of imputation Abbreviations: PP denotes, posterior prediction probability; N, count; Accuracy, imputation accuracy. *The models trained on the subset of ACCESS African Americans genotype data from a) GWAS + 1000 Genome project imputed data, b) GWAS + 1000 Genome project + targeted sequencing imputed data (DRB1 and DQB1 only) with training samples selected at random (from a total of 325) using to match the number of subjects used to construct the published HIBAG African ancestry models: HLA-DRB1 (n=161), HLA-DQB1 (n=137), and HLA-DPB1 (n=75). †Oneand two-field estimates will be very similar for HLA-DPB1 as the first field uniquely identifies the two-field alleles, with the exceptions of *02:01 and *0202, and *04:01 and *04:02.
accuracy should be conservative, since HLA typing misclassification will likely decrease our estimate of imputation accuracy. Despite the limitations of our HLA typing data, our findings are similar to the ethnicityspecific accuracy estimates reported in the HIBAG [16] and HLA*IMP:02 [13] manuscripts that both used allele typing based on more recent versions if the IMGT/HLA Database. Due to recent gains in knowledge regarding the differences in HLA allele frequencies worldwide [39][40][41][42], larger representative reference panels coupled with current HLA allele typing should lead to improvements in imputation of lower frequency alleles in admixed populations such as African Americans.

Conclusions
In conclusion, our study suggests that the newly developed HIBAG approach is appropriate for use of HLA class II imputation in both European and admixed non-European populations. Imputation quality is closely associated with HLA allele frequency, training sample size, SNP density, and how well the training sample represents the test sample in ancestral origin. The latter point is especially true for admixed populations, where our findings suggest that accounting for local ancestry in the selection of the training samples will be beneficial. Additionally, applying nextgeneration targeted sequencing data, when available, may boost both HIBAG imputation accuracy and certainty in modest samples sizes of admixed individuals. We expect that these results are generalizable to other African admixed populations and should be useful in any study seeking to better characterize the role of HLA class II alleles.

Consent and ethics statement
As stated above, data for this study was derived from four prior studies [22][23][24][25]

Genotyping
Details of the molecular allele typing are reported in Rossman et al. [18]. Briefly, HLA typing was conducted over exon 2 for the class II genes -DRB1, −DQB1, and -DPB1. Low (one-field) to intermediate (two-field) resolution typing was performed with sequence-specific oligonucleotide probes available through Orchid Diagnostics [18], in the context of the version 1.13 release of the IMGT/HLA 2002 database. Genotyping was performed using the Illumina HumanOmni1-Quad [17]. SNPs meeting the following quality control criteria were included: well-defined cluster plots by visual inspection; call rate greater than 90%; minor allele frequencies greater than 0.001; and p-value greater than 0.001 for Hardy-Weinberg proportion tests in controls.

Targeted resequencing, variant detection, and quality control
Purified genomic DNA from 480 African American individuals (187 sarcoidosis cases, 293 controls) was prepared for sequencing using the Illumina Paired-End DNA Sample Preparation Kit (San Diego, CA). The Illumina TruSeq technology with a custom-designed bait pool was used to enrich captured regions (including HLA-DRB1 and -DQB1). Resequencing and generation of fastq sequencing reads were performed on the Illumina HiSeq2000 platform with Illumina Pipeline software (version 1.7). After removing duplicates, alignment to the Human Reference Genome (build hg19) was performed with BWA alignment software (version 0.5.9) [43]. Realignment around insertion/deletion sites, base quality score recalibration, and variation detection were carried out using GATK software (version 1.0) [44,45]. Variants displaying any of the following were removed: quality score <30; by-depth score <5; strand bias score > − 0.1; homopolymer runs ≥5 bases; or variants clustering within 10 base pairs. Average sequence coverage was 75x. Three samples were removed because of low genotype call rates (<95%). Variant phasing was performed using BEAGLE (version 3.3) [12]; PLINK (version 1.07) [46]; and IMPUTE2 [47]. File formatting was performed with VCFtools (version 0.1.3) [48]. IMPUTE2 [47] was used to impute variants spanning chromosome 6p21, with targeted sequencing data and the 1KG Phase I integrated variant set as reference panels. Variants with low imputation accuracy (information measure <0.5; average maximum posterior genotype call probability <0.9) or failing to meet qualitycontrol standards (described above) were excluded. Imputation was performed using the 1KG data over a region on chromosome 6 starting at 31,842,535 bp and extending to 33,720,220 bp, which included HLA-DRB1, −DQB1, and -DPB1; targeted sequencing data was available for the region starting at 31,842,535 bp and extending to 32,720,220 bp, which included HLA-DRB1 and -DQB1 only.
developers of the HIBAG software, who provided support and guidance in implementing HIBAG for these analyses; Dr. Matthew Nelson, who participated in helpful discussions of HLA imputation strategies; and the NHLBI-funded ACCESS and SAGA research groups, as well as the participants in these studies, who contributed to the original data collection efforts. The authors would also like to acknowledge their funding sources that made this work possible: R56-AI072727, R01-HL092576 (BAR); R01-HL54306 , U01-HL060263 (MCI), 1RC2HL101499, R01HL113326 (CGM); P20GM103456 (IA).