A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples

Background One of the challenges of the analysis of pooling-based genome wide association studies is to identify authentic associations among potentially thousands of false positive associations. Results We present a hierarchical and modular approach to the analysis of genome wide genotype data that incorporates quality control, linkage disequilibrium, physical distance and gene ontology to identify authentic associations among those found by statistical association tests. The method is developed for the allelic association analysis of pooled DNA samples, but it can be easily generalized to the analysis of individually genotyped samples. We evaluate the approach using data sets from diverse genome wide association studies including fetal hemoglobin levels in sickle cell anemia and a sample of centenarians and show that the approach is highly reproducible and allows for discovery at different levels of synthesis. Conclusion Results from the integration of Bayesian tests and other machine learning techniques with linkage disequilibrium data suggest that we do not need to use too stringent thresholds to reduce the number of false positive associations. This method yields increased power even with relatively small samples. In fact, our evaluation shows that the method can reach almost 70% sensitivity with samples of only 100 subjects.


Background
The availability of genotyping assays for hundreds of thousands of single nucleotide polymorphisms (SNP)s is making genome wide association (GWA) studies more accessible to a broad range of genotype-phenotype investigations. The promise of this technology is that it will accelerate gene discovery for polygenic diseases and complex phenotypes of Mendelian disorders because data for all genes can be obtained simultaneously [1,2]. At the same time, the large number of significance tests performed is expected to result in a large number of false positive association signals. In fact, the number of signals observed by chance may well be greater than those that are authentic [3]. Thus, the development of analytic methods and strategies to distinguish authentic signals from those due to chance will contribute significantly to disease-gene association studies.
Here we describe a modular procedure to analyze data from pooling-based GWA studies that use the Illumina SNP microarray technology [4]. Rather than genotyping individual samples, the pooling-based technology types a carefully constructed pool of DNA samples that can be used to infer allele frequencies and is an affordable alternative to GWA studies that are still a financial burden for many investigators. Several studies have shown the usefulness of pooling-based GWA studies to discover SNPs associated with disease [5][6][7][8][9] using well calibrated methods [7,[10][11][12], and a variety of methods to estimate allele frequencies from pooled-based studies that use the Affymetrix microarray technology have been proposed [13,14]. Our objective is twofolds: (i) we wish to assess reproducibility and accuracy of the algorithm proposed by Illumina to detect chromosomal aberrations when used to estimate allele frequencies from pooled DNA samples [15]; and (ii) we propose a modular approach to the analysis of pooling-based GWA studies that limits the loss of power due to both the use of pools of DNA samples and the issue of multiple comparisons.
Several studies apply stringent thresholds on the significance level that is required to determine significant SNP-phenotype associations [16][17][18]. Contrary to this approach, our method integrates Bayesian tests for general associations [19] with decision rules based on the structure of linkage disequilibrium (LD) discovered through the International HapMap project [20], and other machine learning techniques to reduce the number of false positive associations. We also describe a hierarchical procedure to summarize the findings in terms of genes that can be further synthesized into gene sets using Gene Ontology annotations [21], pathways [22,23], or chromosomal bands. We evaluate this method using data from the sixty unrelated CEPH parents used for the International HapMap project [20] and two independent datasets. The first is a study of fetal hemoglobin (HbF) levels in African American subjects with sickle cell anemia and the objective is to discover genetic modulators of HbF. The second dataset is a study of exceptional longevity in a cohort of centenarians. In both datasets, using our novel analytic approach, we identified association signals in genes previously known to affect these phenotypes. The method is implemented in the R package and can be integrated with other R packages for genetic analysis, or GWA studies [24,25]. We develop the method for the analysis of pooled DNA samples [26,27], but the approach can be easily extended to the analysis of samples that are individually genotyped.

Results
We ran three sets of experiments to assess the reproducibility and accuracy of the estimates of the allele frequencies derived from pooled DNA samples, as well as the sensitivity and specificity of our modular procedure.

Experiment 1: accuracy and reproducibility
We obtained DNA samples from the 60 unrelated parents used in the HapMap CEU panel and created 2 duplicated pools of 30, 45 and 60 samples each (Table 1 provides a summary). The pooled DNA samples were analyzed in duplicates with the Illumina Sentrix HumanHap300 Genotyping BeadChip (v.1) and b-allele frequencies were estimated using the Illumina LOH and Copy Number analysis tool. The reproducibility was assessed by the agreement between allele frequency estimates in the two replicate samples for each pool (Table 1). Shown in Figure  1 is the scatter plot of two independent replicates of allele frequency estimates for the 22842 SNPs tagging chromosome 1 (top), and the 5452 SNPs tagging chromosome 22 (bottom) obtained with pools of 60 samples. The plots show a high degree of agreement that is confirmed for different sample sizes as shown by the results summarized in Table 1. Plots for other chromosomes are in the supplementary material [28].
We assessed the accuracy of the allele frequency estimates from pooled DNA samples by comparing the average estimates over the replicated pools with the allele frequencies computed using individually genotyped DNA samples that are available from the web site of the HapMap project [29]. A scatter plot of part of the results is displayed in Figure 1 for pools of 60 samples. The error analysis summarized in Table 1 suggests that, on average, the allele frequency based on the analysis of replicated pooled DNA samples differ from those based on individually genotyped data by approximately ± 0.04 but the error can be as The reproducibility of the allele frequency estimates is shown by the scatter plot of repeated estimates of allele frequency inferred from pooled DNA samples (left) Figure 1 The reproducibility of the allele frequency estimates is shown by the scatter plot of repeated estimates of allele frequency inferred from pooled DNA samples (left). The labels "run 1" and "run 2" in the x-and y-axis specify each replication. The accuracy of the allele frequency estimates is shown by the scatter plot of the estimates of allele frequency inferred from pooled DNA samples (y-axis in the right plots) and those computed from individually genotyped samples (x-axis). The analysis of the other chromosomes shows similar results.
large as ~0.12 = 0.04+2×0.06/√2 thus making differences in allele frequencies smaller than 0.24 difficult to detect because of technical errors. However, our analysis shows that less than 5% of the estimates based on pools of DNA samples differ from those based on individually genotyped samples by more than 0.12, and less than 10% differ by more than 0.08. This suggests reducing the minimum detectable allele frequency difference to 0.15 with a 10% chance of error. Furthermore, we have observed that amplifying DNA does not appear to affect either the reproducibility or the accuracy of the analysis.
To infer the effective sample size to be used in the analysis, we also looked at the distribution of the ratio between the two types of allele frequency estimates: say p(Si) = ni/n and q(Si) where ni is the frequency of the minor allele of the SNP Si computed from the samples that were typed individually, n is the overall sample size, and q(Si) is the frequency of the same minor allele computed from the analysis of the pooled DNA samples, in the different sets. The analysis demonstrated that log(q(Si)/p(Si)) has approximately a normal distribution with 0 mean and standard deviation 0.35. From this data, we deduced that about 95% of allele frequency estimates derived from the pooled DNA samples can be assumed to be within the interval p(Si) exp(± 1.95 × 0.35) from which we derive the empirical relation between p(Si) and q(Si): 0.51 p(Si) < q(Si) < 1.98 p(Si) with a range of uncertainty of 1.47 ni/ n. The inequality suggests that when we infer allele frequency from pooled DNA samples, we have a loss of precision approximately equivalent to using 2/3 (= 1/1.47) of the original DNA sample size. We call this the "effective sample size" used in the calculation of the Bayesian test of association.

Experiment 2: specificity
To estimate the false positive rate (FPR) we used real data from pools of DNA samples to create artificial sets of pools. The original pools are described in Table 2 and were generated in duplicates to discover genetic variants associated with exceptional longevity [30], and fetal hemoglobin expression in subjects with sickle cell anemia [31]. The Illumina Sentrix HumanHap300 Genotyping BeadChip was used for all the experiments. We created the artificial sets of pools by mixing replicates of different pool sets. For example, we generated a set of two pools by taking one replicate of the pooled DNA samples from the female centenarians and one replicate of the pooled DNA samples from the younger female controls, and we constructed a second set of two pools by taking the remaining replicates from the two sets (See Figure 2 for an example). Because the two artificial sets of pools are homogeneous relative to the phenotype, the differences in allele frequencies between the two sets can be attributed to chance, and the SNPs with significant differences in allele distribution are false positives. We repeated this analysis by mixing different types of pools of DNA samples and using a BF>3, together with the LD and regional filters, we observed a false positive rate ranging between 0.001 and 4×10-4 with a mean of 0.001, and an average of 300 SNPs selected by chance. Note that this number is substantially smaller than the number of false positive associations that we would expect by chance using a BF>3. This threshold is equivalent to accepting an association when the posterior probability of the association is greater than 0.75, so that we expect 1 in 4 associations to be false. Also the specificity of the selected and significant genes was very high: The number of genes that by chance were selected in two unrelated analyses was 9 and this number was further reduced to 7 when we limited attention to significant genes. These numbers should provide a reference when we examine the reproducibility of findings in different studies, because we expect that, by chance alone, we would have an agreement in about 0.1% of findings. We note that long genes that are tagged by a larger number of SNPs are more likely to be selected by chance in different studies. Figure 3 displays the log10 BF in the 1,114 false positive associations generated in approximately the 106 association tests. The plot shows an exponential decay of the BF so that the chance to observe a very large BF has an exponential decay, and the probability of observing a BF greater than 10 by chance is 6 × 10-4, whereas the probability of observing a BF greater than 100 is 3 × 10-4, and greater than 1000 is 2 × 10-4. This analysis however shows that trying to reduce the false positive rate by imposing a stringent threshold on the BF would likely reduce the power of relatively small association studies and require unrealistically large sample sizes.
We also run experiments to assess the effect of LD and regional filters on the specificity. Using the same simulated sets, we run the analysis by using only a BF>3 to select the significant SNPs, and also examined the effect of adding either the LD filter or the regional filter or both on the false positive rate. Our results suggest that the LD filter reduces the false positive rate by 43%, while the regional filter alone increases the false positive rate by approximately 25%, and both filters decrease the false positive rate by 20%. These results are consistent with the intuition that the regional filter increases the power by finding clusters of SNPs that individually have small effects and would be disregarded by a one-SNP-at-a-time analysis. However, the effect is to slightly increase the false positive rate. This conjecture is confirmed in the next experiments that we conducted to assess the sensitivity.

Experiment 3: sensitivity
In related work we are analyzing pools of DNA samples as a screening tool to discover genetic variants associated with exceptional longevity [30], and fetal hemoglobin (HbF) expression in subjects with sickle cell anemia [31].
As an indication of the sensitivity of technology and analytic method, we searched for SNPs in the Illumina Sentrix HumanHap300 Genotyping BeadChip that have been reported associated with either trait in independent studies, and verified whether an association was found based on the pooled DNA samples.

HbF experiment
We created two pools using DNA samples from 55 patients in the top and 54 patients in the bottom quartile Example of the artificial pool sets that we created to assess the specificity of the procedure Figure 2 Example of the artificial pool sets that we created to assess the specificity of the procedure. As an example, the top four pools were generated to compare the genome of centenarians (pools 1 and 2) with that of younger controls (pools 3 and 4). The two artificial pool sets are obtained by mixing pools of centenarians DNA with those of controls.
Distribution of the log 10 Bayes factor in 1,114 false positive associations generated in approximately 10 6 association tests with an estimated false positive rate 5 10 -6 Figure 3 Distribution of the log 10 Bayes factor in 1,114 false positive associations generated in approximately 10 6 association tests with an estimated false positive rate 5 10 -6 . The analysis shows that the chance to observe a very large Bayes factor has an exponential decay, and the probability of observing a Bayes factor greater than 10 by chance is 6 10 -4 , the probability of observing a Bayes factor greater than 100 is 3 10 -4 , greater than 1000 is 2 10 -4 .
of HbF concentrations. These patients were part of a clinical trial described in [32]. The pools were run in duplicates, and the data analyzed using the method proposed here. We searched the literature and found 36 SNPs with rs numbers that were reported associated with different levels of HbF [31,[33][34][35]. Thirteen of these SNPs are in the Illumina array, and only 3 of these were found associated in our analysis with a BF greater than 1, and 2 with a BF greater than 3. The moderate effect of the other 10 SNPs (odds ratios between 0.55 and 1.76) is consistent with the weak associations reported by other investigators and would not be detectable with our sample size of about 60 subjects per group. In fact, a sample size of 60 subjects would give at most 30% power to detect an odds ratio of 1.75 when the MAF in one group is 0.5. We also found 23 SNPs in the Illumina array that are within 150 kb of the other 23 reported SNPs, and are associated with HbF levels with a BF greater than 1, and 13 of these had a BF greater than 3. Fifteen of these SNPs were typed as part of the HapMap project and ten of these are in strong LD (Bayes D' > 0.8) [36]. Thus the analysis based on pooled DNA samples discovered association of 26 SNPs, and for 13 of them the association was strong. This analysis suggests a sensitivity of 72% and if we limit attention to associations supported by a BF of at least 3 the sensitivity is 36%. The details of the associations are in Table 3.

Longevity experiment
We created pools of DNA samples from unrelated centenarians and younger controls. Because there is evidence of gender effect [37] ---85% of centenarians are female---we created distinct pools for males and females as summarized in Table 2. We searched the literature and found 36 SNPs with rs numbers reported as associated with longevity [38][39][40]. Seven of these 36 SNPs are in the Illumina array, and five of these seven were found associated in our analysis. For 21 of the remaining 29, we found SNPs within 100 kb that were associated with longevity in either the males and female comparisons, or both. These SNPs are reported in Table 4. The analysis suggests approximately 67% sensitivity, and this is consistent with the sensitivity estimated with the HbF experiment. We also noted that the regional filter helped identify some of the associations that would be lost with a tight threshold on the BF. As an example, the SNP rs2227956 on HSPA1A was found associated with longevity in males only when the regional filter is used, and the two SNPs in WRN -a well known longevity gene in mice [41] -were selected by the regional filter. A similar form of sensitivity analysis is to see whether the GSEA analysis can lead to discover sets of functionally similar genes that are known to be associated with longevity. GSEA analysis of the centenarian cohort revealed several enriched GO biological categories (Table 4 and manuscript in preparation). Among the significantly enriched categories were genes associated with immune response (e.g., CSF3) and DNA repair (e.g., XRCC4), see Table 4. Intriguingly, CSF3 (also known as GCSF) is reported to influence migration of stem cells between the bone marrow and blood [42,43] and appears to promote regeneration of myocardial tissue [44][45][46], which has clear relevance to longevity. The gene XRCC4 has a well established role in DNA repair [47], and unrepaired DNA has been reported to accelerate ageing, possibly through dysregulating the IGF/growth axis [48]. Therefore, a comprehensive analysis of these and other genes present with the enriched gene sets that we detected will be essential to fully appreciate pathways engaged that contribute to the longevity phenotype.

Discussion and Conclusion
We have developed a hierarchical and modular approach to the analysis of genome wide genotype data based on pooled DNA samples. The method incorporates quality control data, information about linkage disequilibrium, Bayesian association tests, physical distance and gene ontology to identify associations warranting further investigation. Our evaluation using real data has shown the accuracy, reproducibility, sensitivity and specificity of the method.
Compared to other approaches, the integration of Bayesian tests with information about linkage disequilibrium and other machine learning techniques implies that we do not need to use too stringent thresholds to reduce the number of false positive associations. The implication of this fact is an increased power even with relatively small samples. In fact, our estimate of the sensitivity shows that the method can reach almost 70% sensitivity with samples of only 100 subjects.
Although we developed the approach to analyze pooled DNA samples, the method can also be used for the analysis of individually genotyped samples.

Genotyping
For the HbF study, DNA was obtained from the 60 subjects with HbF levels below the first quartile of the distribution, and the 60 subjects with HbF levels above the third quartile who were enrolled in the Multicenter Study of Hydroxyurea (MSH) study in Sickle Cell Anemia [49]. DNA samples from 260 centenarians and a control group of 230 subjects were obtained from the New England Centenarian Study: a cross-sectional study of individuals aged 97 and older conducted at the Boston Medical Center. CEPH DNA samples of the sixty unrelated parents used for the International HapMap project [20] were obtained from the Coriell Institute, Camden, NY and used to compare the accuracy and reproducibility of the estimates of allele frequency in pooled DNA samples compared to individually genotyped samples. For DNA pool construction and to ensure that each individual contributed equally to the pool, we first measured DNA stock solutions using a fluorimetric method (RNAseP) against a standard curve constructed from known concentrations of human genomic DNA. We then diluted the stock solutions to 10 ng/ul and measured the concentrations of these working solutions by means of PicoGreen. In the case of samples for which the CV of the three measurements was greater than 10%, quantification was repeated in triplicate until the CV was smaller than 10. Measurements were highly reproducible, with a correlation coefficient of 0.97 between the third measurement and the average of the first two. Based on these concentrations, 50 ng of DNA were added to the pool for each individual. The pools of DNA were analyzed on the Sentrix HumanHap300 bead chip (Illumina) according to the manufacture's protocol. The data used in the HbF and longevity studies will be released with companion publications. We make available the data derived from pools of CEPH DNA samples from the supplementary web site [28]. The HbF and longevity studies were approved by the Institutional Review Boards of Boston University.

Association test
The overall analytic strategy is shown in Figure 4. The first module is a statistical procedure to test the allelic associa- Column 1: row number; Column 2: SNP ID; Column 3: Cytogenic band; Column 4: Genes tagged by the SNP; Column 5: SNP in the Illumina array that was used to compare the association as described in the caption of Table 3; Column 6: distance between the two SNPs; Column 7: Bayes D' between the two SNPs; Column 8: Bayes Factor; Column 9-10: estimates of allele frequencies in the pools of DNA from male centenarians and younger controls; Column 11: Odds ratio. Columns 12-18: as columns 5-11 but for the pools comparing female centenarians to younger controls. The SNPs 8,10,14,[16][17][18][19]22 are in the array and SNPs 8, 14, 17-19 were found associated with longevity in females and/or males. Highlighted in bold are the associations confirmed by our analysis.
tion between each individual SNP and the phenotype. The input data are the allele relative frequencies estimated with the "b-allele frequency" value provided by the Illumina Beadstudio genotype module. This value represents the relative proportion of each allele in the DNA sample and is used by the Illumina loss of heterozygosity (LOH) and Copy Number analysis tool [15] to detect chromosomal aberrations and copy numbers by comparing the normalized intensity of the test sample (the pooled DNA samples) to a reference sample. We use the estimate of the allele frequencies θ ij in test and control pools to reconstruct the expected allele frequencies as where is the effective sample size in pool i, the index i = 1 for cases and i = 2 for controls, and the index j = A,B denotes the A or B allele. We then use a Bayesian test of association to compare the distributions of allele frequency in the two different pools. The test is described in [19,50] and assumes that prior probabilities are available for the model of allelic association and the model of no association -say p(M a ), p(M i ) -and then uses the data to update these prior probabilities p(M a ), p(M i ) into the posterior probabilities by using Bayes' theorem. The decision rule is then to select the model of association if its posterior probability is at least 3 times larger than the posterior probability of the model of no association (as suggested in reference [51]). Formally, the ratio of the posterior probabilities is and the ratio of the marginal likelihood functions p(n ij | M a )/p(n ij | M i ) is known as the Bayes factor (BF). When the prior probabilities of the two models are equal, the BF is equivalent to the posterior odds. Assuming the conjugate Beta distribution for the allele frequencies, the BF can be calculated in closed form and the formula for this calculation is reported in the appendix. To take into account the issue of multiple testing, we can use prior information about the number of SNPs that we expected to be associated to make the selection stronger. For example, if we expect 1,500 SNPs associated with the phenotype, then the prior odds for the alternative hypothesis of association are 0.005/0.995 when we test 300,000 SNPs, and the decision rule becomes to accept that a SNP is associated with the phenotype if the posterior odds for the association are at least 3 × 0.995/0.005 = 597. Initial experiments described in the Evaluation section suggest that a robust choice for an effective sample size is 2/3 of the original pool, and this is consistent with a larger sample size needed with the analysis of pooled DNA samples [52]. We note here that one advantage of this modular procedure is that the Bayesian test can be replaced by a standard χ 2 test for allelic association.

Filtering out false positives
Although we can take into account the issue of multiple comparisons by choosing appropriate prior odds for an association, the consequence of this approach is to reduce power and to require large sample sizes to detect associations with a small effect. This consequence can be problematic in studies where cases are relatively rare such as the study of exceptional longevity in which cases are subjects who lives 100 years and older. To fully exploit the power of small scale studies we developed a series of data filters that remove unreliable or suspicious associations (see Figure 4). The first filter is specific for allele association analysis using pooled DNA samples and accounts for the lack of precision of the technology. The other two filters take into account redundancy as well as reciprocal information of SNPs based on the LD structure of the human genome. So, rather than using "SNP pruning" as Schematic summary of the modular approach to the analysis of GWA data Figure 4 Schematic summary of the modular approach to the analysis of GWA data.
in PLINK to remove SNPs that are in LD [53], we leverage on dependencies determined by LD to improve the detection of false positive while reducing the false negative rate.

Quality control (QC) filter
The function of this filter is based on an extensive evaluation of the accuracy and reproducibility of the allele frequency estimates that are computed with the Illumina software. Allele frequencies obtained from genotyping of pooled DNA samples were compared with those derived from genotyping of individual samples (detailed in the below Evaluation section). The results suggest that alleles with minor allele frequency MAF <0.15 as well as differences in allele frequency of less than 0.15 are not reliable.
We therefore filter out all SNPs with these characteristics, as well as those SNPs for which repeated estimates of allele frequencies in replications of the same pool differ by more than 0.15.

Linkage disequilibrium (LD) filter
SNPs in LD with each other would be expected to show similar patterns of association if the signal is authentic whereas a single SNP in a LD block showing association is more likely to represent a spurious association. Therefore, our procedure automatically checks this condition and disregards the associations for SNPs that are not supported by positive associations with other SNPs in the same LD block. To this end, we used genotype data collected within the HapMap project to compute pairwise measures of LD for all consecutive pairs of SNPs in the HumanHap300 platform. The estimation of LD was based on a novel Bayesian version of D' that we introduced in [36]. As the traditional D', our Bayesian estimator is defined in the interval [0;1] regardless of the allele frequency so that it is easier to interpret than other measures of correlation like r2 but it is much less biased toward disequilibrium. We use a Bayesian D' > 0.7 between pairs of consecutive SNPs as suggestive of strong LD and we filter out all the associations of the SNPs whose adjacent SNPs that are in strong LD are not associated with the phenotype. The value 0.7 was chosen based on experiments reported in [36] showing that the Bayesian D' rarely exceeds 0.7 under no LD. The Bayesian D' values for each pair of consecutive SNPs were built for Caucasians using the DNA samples from unrelated parents of thirty trios of the CEPH (Utah residents with ancestry from northern and western Europe, also known as CEU) and similarly for Africans, using Yoruba in Ibadan Nigeria. These data are available from the supplementary material web site [28].

Regional association filter
The rationale of this filter is that a region or gene showing authentic association would be expected to show a greater number of SNPs associated than would be expected by chance. In this filter, we analyze the data using a sliding window of 20 SNPs, and summarize the global measure of association within the window as the product of the posterior probabilities of associations of the 20 SNPs.
Here, we assume that the association tests are independent, so that the product of the posterior probability of association of the individual SNPs becomes a measure of the global association of the region tagged by the 20 SNPs. Windows with a global measure of association exceeding 0.5 [or product of BFs >1] are then selected for further inspection. Although LD between SNPs in a window may introduce dependencies, the global measure of association does not seem to be affected by this approximation. Figure 5 shows some examples.

Hierarchical summary
The list of SNPs that are selected by the association test and the filters are labeled as "significant SNPs". The list is annotated by the SNP physical position, the position relative to known genes, the allele frequencies estimated in different populations, and the cytogenetic band. This information is collected through SNPPer [54] that integrates information from the UCSC human genome browser and dbSNP [55]. As a further level of summary we use those SNPs that are linked to genes to create a set of selected genes and a set of significant genes. The first set consists of genes that are tagged by at least one significant SNP. The set of significant genes is a subset of the selected genes and consists of genes in which the global measure of association given by the product of the posterior probability of association of the gene tagging SNPs is greater than 0.5 [Or equivalently, the product of BFs exceeds 1].

Ranking
We rank the significant genes by the global measure of association. To rank the selected genes we score them by two further measures that weigh the likelihood of selecting a gene by chance. In fact, there are genes that are tagged by a large number of SNPs: for example CSMD1 in chromosome 8 is tagged by 614 SNPs in the HumanHap300 array, and assuming a 5% false positive rate, we would expect about 30 SNPs to be selected from this gene by chance in any analysis. To take this issue into account, each selected gene is assigned 3 scores: the global measure of association, the ratio of selected SNPs relative to the number of tagging SNPs, and the probability of selecting this number by chance using the hyper-geometric distribution. Each score determines a ranking and then the sum of the ranks is defined as a final ranking of selected genes.

Gene set enrichment analysis
To evaluate selected and/or significant genes for enrichment of biological categories associated with a variable phenotype, we implement a stand-alone version of the EASE statistical software [56]. This program computes a modified Fisher's exact probability score for observing the frequency of a biological category associated with a phenotype (e.g., dementia, sickle phenotype, infection), compared with the likelihood of identifying that category by chance given the total number of genes in the data set. An adjusted score is then reported representing the upper bound of the distribution of Jackknife Fisher exact probabilities for observing an enriched biological category. Enriched categories are then inspected for biological trends and overlapping or related categories, based on sig-nificance scores or categories with a p-value << 0.05. For more detail, see Hosack et al. [56].

Authors' contributions
PS developed and implemented the analytic method, conceived the evaluation and drafted the manuscript. ZZ contributed to the implementation and evaluation. MAG, AR, SH, AS, developed the support material for the various filters and hierarchical summary. AD helped to design and conduct the experiments with the CEPH samples. MM contributed to the development of the hierarchical method and the interpretation of the analysis results. EM Relation between the pattern of LD (x-axis) and the global measure of association (y-axis) in the regional filter Figure 5 Relation between the pattern of LD (x-axis) and the global measure of association (y-axis) in the regional filter. The pattern of LD is measured by the average of the Bayes D' between consecutive SNPs in the region, and the global measure of association is the joint probability of association in the region. The two figures in the top half show the relation using data from the study of fetal hemoglobin in the sickle cell anemia subjects. The two figure in the bottom half show the relation using data from the longevity study. The different extent of LD reflect the fact that sickle cell anemia subjects are all African American while centenarians in the longevity study are all Caucasians The correlations in the four sets are 0.03, 0.18, 0.018, -0.10.
and CB carried out the pooling-based GW genotyping. DT, TTP, MS contributed to the development and evaluation of the method using the longevity study and the HbF study. All authors read and approved the final manuscript.

Derivation of the Bayes Factor
This Bayesian association test assumes that the allele frequencies follow a binomial distribution with probabilities θ ij the index i = 1 for cases and i = 2 for controls, and the index j = A,B denotes the A or B allele [See Table 5  Compared to the maximum likelihood that returns the likelihood function evaluated in the estimate of the parameters, the marginal likelihood incorporates the uncertainty about the parameters by averaging the likelihood functions for different parameter values. This conceptual difference is fundamental to understand the different approach to model selection: in the classical framework, model selection is based on the maximized likelihood and its sampling distribution to take into account sampling variability, for fixed parameter values.
In the Bayesian framework, model selection is based on the marginal likelihood which takes into account the parameter variability, for fixed sample values. Therefore, no significance testing is performed when using this approach to model selection. Our experience with the Bayesian procedure to model selection is that it is usually more robust to false associations. The calculation of the marginal likelihood can be done in closed form and it is easy to show that Where G is the gamma function and the ratio produces the Bayes factor.  Cases (one pool) n 1A = p(A|cases)*(2n 1 ) n 1B = p(B|cases)*(2n 1 ) 2 n 1 Controls (one pool) n 2A = p(A|controls)*(2n 2 ) n 2B = p(B|controls)*(2n 2 ) 2 n 2 n A n B