Phenotype-genotype association grid: a convenient method for summarizing multiple association analyses

Levy, Daniel; DePalma, Steven R; Benjamin, Emelia J; O'Donnell, Christopher J; Parise, Helen; Hirschhorn, Joel N; Vasan, Ramachandran S; Izumo, Seigo; Larson, Martin G

doi:10.1186/1471-2156-7-30

Methodology article
Open access
Published: 22 May 2006

Phenotype-genotype association grid: a convenient method for summarizing multiple association analyses

Daniel Levy^1,2,3,4,5,
Steven R DePalma⁶,
Emelia J Benjamin²,
Christopher J O'Donnell^1,2,7,
Helen Parise⁸,
Joel N Hirschhorn^6,9,10,
Ramachandran S Vasan²,
Seigo Izumo¹¹ &
…
Martin G Larson^2,8

BMC Genetics volume 7, Article number: 30 (2006) Cite this article

7405 Accesses
4 Citations
Metrics details

Abstract

Background

High-throughput genotyping generates vast amounts of data for analysis; results can be difficult to summarize succinctly. A single project may involve genotyping many genes with multiple variants per gene and analyzing each variant in relation to numerous phenotypes, using several genetic models and population subgroups. Hundreds of statistical tests may be performed for a single SNP, thereby complicating interpretation of results and inhibiting identification of patterns of association.

Results

To facilitate visual display and summary of large numbers of association tests of genetic loci with multiple phenotypes, we developed a Phenotype-Genotype Association (PGA) grid display. A database-backed web server was used to create PGA grids from phenotypic and genotypic data (sample sizes, means and standard errors, P-value for association). HTML pages were generated using Tcl scripts on an AOLserver platform, using an Oracle database, and the ArsDigita Community System web toolkit. The grids are interactive and permit display of summary data for individual cells by a mouse click (i.e. least squares means for a given SNP and phenotype, specified genetic model and study sample). PGA grids can be used to visually summarize results of individual SNP associations, gene-environment associations, or haplotype associations.

Conclusion

The PGA grid, which permits interactive exploration of large numbers of association test results, can serve as an easily adapted common and useful display format for large-scale genetic studies. Doing so would reduce the problem of publication bias, and would simplify the task of summarizing large-scale association studies.

Background

The advent of high-throughput technology is generating unprecedented amounts of genotypic data that are being used in association analyses for multiple phenotypes. A single project may involve genotyping many genes with several variants (such as single nucleotide polymorphisms [SNPs]) per gene and analyzing each variant in relation to numerous phenotypes. In turn, each phenotype-SNP pair may be subjected to multiple genetic models and subgroup analyses. Hundreds of statistical tests may be performed for a single SNP, thereby complicating interpretation of results and inhibiting identification of patterns of association within a vast sea of data. Ultra-dense genome scans using 300,000 to 1,000,000 SNPs [1–3]will require efficient methods for analysis and presentation of results.

We are currently studying common SNPs in 200 candidate genes to test associations with alterations in echocardiographic phenotypes in participants from NHLBI's Framingham Heart Study. For each SNP, 144 statistical tests are performed: genotypes are analyzed with regard to six phenotypes (left ventricular [LV] mass, LV internal dimension, LV wall thickness, left atrial dimension, aortic dimension) through four genetic models (general, dominant, additive, recessive), with two levels of covariate adjustment (age and sex; age, sex and multiple additional covariates) in three samples (pooled sexes, men, women). Planned analyses of 1500 SNPs will generate nearly one quarter of a million statistical tests. Further details can be found on the CardioGenomics website [4].

As analyses commenced, it became obvious that we needed summary methods of data distillation and presentation to highlight findings of potential importance and to identify patterns of association, such as associations limited to one of multiple phenotypes, or associations limited to one sex. Therefore, we developed an approach that displays strengths of statistical associations at a glance, and that makes supporting data available easily via graphs accessed by a mouse click.

Results

Figure 1 (top panel) presents a Phenotype-Genotype Association (PGA) grid for a single SNP. Color coding denotes levels of statistical significance. In this example, associations having nominal P-values <0.05 were observed for four of six phenotypes and patterns of significance differed by sex. The color/visual aspect of the grid also helps in discerning patterns of association among related phenotypes.

The PGA grid is interactive. Clicking on a specific cell generates a plot of adjusted least squares means for the trait of interest by genotype for the corresponding genetic model. Figure 1 (bottom panel) displays this plot for the highlighted cell in Figure 1 (LV fractional shortening for pooled sexes, general model, adjusted for age and sex). At the gene level, thumbnail PGA grids for each typed SNP are displayed on a single page with each thumbnail sorted by map position and hyperlinked to its full-sized parent grid. The underlying database can be searched by gene, P-value, or phenotype to facilitate hypothesis generation and pursuit [4].

The software to create PGA grids from user-supplied data (sample sizes, means and standard errors, P-value for association) utilizes a database-backed web server. We generate HTML pages using Tcl scripts on an AOLserver platform [5] using an Oracle database [6], and the ArsDigita Community System web toolkit [7]. Source code (see Additional Files 1 and 2, available upon request) is available for free download [8]and can be adapted for use elsewhere on other database-backed web platforms. The grid can be modified to display results for gene-environment interactions (Figure 2). In addition the grid can used to summarize analyses of qualitative traits or haplotypes [3]. For example, one could display a grid for each gene with cells to indicate block-specific P-values based on a global test of differences in phenotype across all haplotypes within the block (Figure 3).

Discussion

The PGA grid was developed to summarize large numbers of phenotype-genotype association tests in a visually useful manner to facilitate interactive exploration of results. This approach could serve as a common format for large-scale association studies. Due to the large number of association tests performed, there will be many nominally significant results. One approach to multiple testing is to indicate P-values deemed statistically significant based on consideration of false discovery rates [9, 10]. Most association tests, however, will yield results that do not achieve significance on their own, but that are valuable in the context of other studies of the same gene [11]. Unfortunately, in most large-scale association studies negative or inconclusive results are usually suppressed during publication or at best presented in extremely abridged form.

Conclusion

The PGA grid provides a simple visual method for displaying a large number of results, potentially reduces the problem of publication bias, and simplifies the task of summarizing large-scale association studies.

Abbreviations

LVM:: left ventricular mass
LVID:: left ventricular internal diameter at end diastole
LVWT:: sum of septal and left ventricular posterior wall thickness
FS:: left ventricular fractional shortening
AoR:: aortic root diameter
LA:: left atrial anteroposterior dimension.
M:: men only
F:: women only
MF:: men and women.

References

Olivier M: A haplotype map of the human genome. Physiol Genomics. 2003, 13: 3-9.
Article PubMed CAS Google Scholar
Cardon LR, Abecasis GR: Using haplotype blocks to map human complex trait loci. Trends Genet. 2003, 19 (3): 135-40. 10.1016/S0168-9525(03)00022-2.
Article PubMed CAS Google Scholar
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochnere A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The structure of haplotype blocks in the human genome. Science. 296 (5576): 2225-9. 10.1126/science.1069424. 2002 Jun 21
[http://cardiogenomics.med.harvard.edu/projects/p5/assoc-results]
[http://www.aolserver.com]
[http://www.oracle.com]
[http://www.openacs.org]
[http://cardiogenomics.med.harvard.edu/src/pga-grid]
Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society B. 1995, 57: 289-300.
Google Scholar
Storey JD, Tibshirani R: Statistical significance for genomewide studies. PNAS. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.
Article PubMed CAS PubMed Central Google Scholar
Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 2003, 33: 177-82. 10.1038/ng1071.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

The Framingham Heart Study is funded by NIH/NHLBI contract N01-HC-25195. CardioGenomics is funded by the National Institutes of Health Program for Genomic Applications (PGA).

Author information

Authors and Affiliations

From the National Heart, Lung, and Blood Institute, Bethesda, MD, USA
Daniel Levy & Christopher J O'Donnell
National Heart, Lung, and Blood, Institute's Framingham Heart Study, Framingham, MA, USA
Daniel Levy, Emelia J Benjamin, Christopher J O'Donnell, Ramachandran S Vasan & Martin G Larson
Cardiology Division, Beth Israel-Deaconess Medical Center, Boston, MA, USA
Daniel Levy
Division of Cardiology, USA
Daniel Levy
Department of Preventive Medicine, Boston University School of Medicine, Boston, MA, USA
Daniel Levy
Department of Genetics, Harvard Medical School and Howard Hughes Medical Institute, Boston, MA, USA
Steven R DePalma & Joel N Hirschhorn
Division of Cardiology, Massachusetts General Hospital, Boston, MA, USA
Christopher J O'Donnell
Department of Mathematics and Statistics, Boston University, Boston, MA, USA
Helen Parise & Martin G Larson
Divisions of Genetics and Endocrinology, Children's Hospital, Boston, MA, USA
Joel N Hirschhorn
Broad Center at Harvard and MIT, Cambridge, MA, USA
Joel N Hirschhorn
Novartis Research Institute, Cambridge, MA, USA
Seigo Izumo

Authors

Daniel Levy
View author publications
You can also search for this author in PubMed Google Scholar
Steven R DePalma
View author publications
You can also search for this author in PubMed Google Scholar
Emelia J Benjamin
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J O'Donnell
View author publications
You can also search for this author in PubMed Google Scholar
Helen Parise
View author publications
You can also search for this author in PubMed Google Scholar
Joel N Hirschhorn
View author publications
You can also search for this author in PubMed Google Scholar
Ramachandran S Vasan
View author publications
You can also search for this author in PubMed Google Scholar
Seigo Izumo
View author publications
You can also search for this author in PubMed Google Scholar
Martin G Larson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Levy.

Additional information

Authors' contributions

Daniel Levy: Conception of the display grid, drafting of paper, revisions to manuscript

Steven R. DePalma: Development of source code for display grid, revisions to manuscript

Emelia J. Benjamin: Conception of the display grid, revisions to manuscript

Christopher J. O'Donnell: Conception of the display grid, revisions to manuscript

Helen Parise: Statistical analyses for incorporation into display grid

Joel N. Hirschhorn: Conception of the display grid, revisions to manuscript

Ramachandran S. Vasan: Conception of the display grid, revisions to manuscript

Seigo Izumo: Principal investigator of CardioGenomics, funding of the project

Martin G. Larson: Conception of the display grid, development of statistical methods, revisions to manuscript

Electronic supplementary material

12863_2005_453_MOESM1_ESM.htm

Additional File 1: "pga-grid-v1.01-src.zip" is the source code for PGA Grid, version 1.01, as a .zip archive containing 31 text files (.tcl, .sql, .pl, .js, .css, .htm, .txt) for use with a Linux/AOLserver/Oracle/ACS web server platform. File descriptions are available in Additional File 2, pga-grid-v1.01-readme.htm. The most recent version of this software is available from http://cardiogenomics.med.harvard.edu/src/pga-grid/. (HTM 7 KB)

12863_2005_453_MOESM2_ESM.zip

Additional File 2: "pga-grid-v1.01-readme.htm" is an HTML-format file that lists and describes each of the files contained in Additional File 1, pga-grid-v1.01-src.zip. (ZIP 94 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Levy, D., DePalma, S.R., Benjamin, E.J. et al. Phenotype-genotype association grid: a convenient method for summarizing multiple association analyses. BMC Genet 7, 30 (2006). https://doi.org/10.1186/1471-2156-7-30

Download citation

Received: 21 March 2005
Accepted: 22 May 2006
Published: 22 May 2006
DOI: https://doi.org/10.1186/1471-2156-7-30

Phenotype-genotype association grid: a convenient method for summarizing multiple association analyses