The Genetic and Environmental Risk Factors for Hemorrhagic Stroke (GERFHS) study is a large case–control study of hemorrhagic stroke in the Cincinnati, Ohio region. To maximize participation and representativeness of the cohort, buccal cytobrush collection was performed on the majority of subjects. Buccal brushes for genetic analysis were collected between 1997 and 2005, blood samples were collected for some subjects from 2000 to 2005, and blood was collected on all subjects after 2005. Each study participant included in this analysis had either a buccal sample or a blood sample genotyped, but not both. The GERFHS study employed a matched case–control design, with matching on age (±5 years), sex, and race; controls were also selected to have the same sample type (buccal or blood DNA samples) as their matched case whenever possible. All individuals included in this analysis were non-Hispanic white.
Buccal brushes were collected on each participant using CYTO-PAK Cytosoft Brushes (Medical Packaging Corp., Camarillo, CA). Study research nurses were trained to collect buccal cytobrush samples in a standardized fashion, as previously described [11]. Blood samples were drawn by hospital nursing personnel using two 10 ml purple-top tubes with EDTA solution from subjects during their hospital stay. Controls were recruited from the community using random-digit dialing. This study protocol was approved by the Institutional Review Boards of the University of Cincinnati and all participating hospitals, and all subjects provided written informed consent for genetic testing.
DNA extraction and genotyping
Both blood and buccal cell DNA were extracted using PureGene DNA extraction kits specific to sample type (Gentra Systems, Inc. Minneapolis, MN), according to manufacturer directions. Briefly, the extraction for buccal DNA was as follows: buccal brushes were cut and placed into a microfuge tube containing cell lysis solution and proteinase K, and incubated overnight at 55°C. After cooling, protein precipitation solution was added, vortexed and centrifuged. The supernatant was transferred to a tube containing isopropanol and glycogen solution, incubated at room temperature, and centrifuged. The DNA pellet was washed with 70% ethanol and air dried. TE buffer was added and incubated at 1 hour at 65°C prior to storage at −20°C.
The total concentration, 260/280, 260/230 ratios of genomic DNA for all samples were measured using a spectrophotometer (NanoDrop ND-1000, NanoDrop Technologies). Finally, double-stranded DNA (dsDNA) concentration was measured using the Quant-it dsDNA BR assay kit and Qubit fluorometer (Invitrogen). All DNA samples were normalized to 50 ng/μl dsDNA in reduced EDTA TE buffer. Samples that did not meet 260/280 ratio of at least 1.7 and 260/230 ratio of at least 1.0 were not included in genotyping.
Cases and their matched controls were arrayed on the same plates, and their genotypes called at the same time. Genotyping was performed on the Affymetrix GeneChip Scanner 3000 platform using Human SNP Array 6.0. The recommended protocol as described in the Affymetrix manual was followed. Five μl (250 ng) of dsDNA was digested with Sty I and ligated to Sty I adapters using T4 DNA ligase. Another 5 μl (250 ng) of dsDNA was digested with NspI and ligated to Nsp I adapters using T4 DNA ligase. Two digested samples were then PCR amplified individually using TITANIUM DNA amplification kit (Clontech) on an ABI9700 machine. PCR products were pooled and purified using the Agencourt AMPure magnetic beads (Beckman Coulter) and 96-well filter plate (E&K Scientific) followed by fragmentation and labeling. Samples were then injected into cartridges, hybridized, washed, and stained. Mapping array images were obtained using the GeneChip Scanner 3000 with GCOS software. Image files were uploaded to the Cincinnati Children’s Hospital Medical Center (CCHMC) Genotyping Data Repository in 10 batches of ~96 samples per batch. Samples were determined to be initial QC failures if they failed to meet the default Birdseed (v. 1.12.0) QC thresholds of Contrast Quality Control (cQC) > 0.4, Dynamic Model (DM) call rate >83%, or were gender mismatches based on actual versus inferred gender. A subset of samples that failed initial QC were re-extracted and/or re-hybridized in an attempt to recover the sample; however, this rarely resulted in sample recovery. Samples that failed initial QC were excluded from full genotyping with Birdseed clustering. Birdseed genotyping was conducted using standard settings. Genotyping results were evaluated for evidence of batch effects using the median test, and results were not different by batch.
Analysis design
All data analysis was conducted using SAS v.9.2 (SAS Institute, Cary, NC). Descriptive characteristics between participants with buccal versus blood samples were compared using parametric t-tests, non-parametric Wilcoxon rank sum, χ2 analyses, or Fisher’s Exact tests, as appropriate to the distribution of the variable.
The statistical evaluation of DNA performance was partitioned into two phases. The first phase was designed to evaluate the role of time since sample collection; relevant patient lifestyle (e.g., smoking or drinking alcohol or caffeinated beverages); and quantitative laboratory-based sample metrics for the ability to discriminate buccal DNA samples likely to fail pre-genotype calling quality control (QC) screening. Identification of independent predictors of sample failure was conducted in real-time (during the course of active genotyping) using data from the first three batches of samples (n = 270 buccal samples). Thresholds were established and then applied to the remaining samples run after that point. After the conclusion of all genotyping, analyses were conducted to compare QC metrics between buccal and blood samples and to calculate failure rates based on the previously established thresholds. This analysis sample set included results from both successful and failed buccal and blood samples, limited to one result per unique individual (n = 850) to avoid inflation of sample failure due to failed repeat hybridizations for some samples.
For this first phase of analysis, the dependent variable was success or failure of buccal DNA to pass pre-genotyping QC metrics. Independent variables for the first phase included the time between sample collection and genotyping; relevant patient lifestyle variables (current cigarette smoking, current cigar smoking, frequency of alcohol use, average daily intake of caffeinated coffee, tea or soda); and the quantity (total DNA concentration, dsDNA concentration, ds/total DNA ratio) and quality (260/280 and 260/230 ratios) of extracted DNA. Each variable was examined for outliers, and variables with excessive skewness or kurtosis (<−1 or >1) were natural log transformed to improve normality; both total DNA and dsDNA concentration were thus analyzed in log-transformed units. Because of differences in measurement method and sensitivity between the Nanodrop and Qubit DNA quantitation techniques, ds/total DNA ratios occasionally exceeded 1.0. Buccal DNA sample characteristics between samples that passed initial QC versus those that failed were compared using Wilcoxon rank sum statistics (for continuous variables) or Fisher’s Exact Test (for categorical variables). Logistic regression was used to construct receiver operator characteristic (ROC) curves and establish thresholds of significant variables to distinguish initial QC success from failure.
The second phase of analysis was designed to test the performance of buccal versus blood DNA samples that passed initial QC. Metrics for this analysis included sample call rate, minor allele frequency (MAF) comparisons between buccal and blood samples in reference to Caucasian (CEU) HapMap samples, and average heterozygosity rates. Heterozygosity rates were calculated separately by sex across all chromosomes. Call rates and heterozygosity rates between buccal and blood samples were compared using Wilcoxon Signed Rank tests.
Low MAF can influence the performance of the clustering algorithms for calling genotypes. Absolute differences between buccal or blood MAF and the CEU reference were analyzed using Wilcoxon Signed Rank tests. In addition, we explored the effect of specific MAF categories (MAF < 0.5%, 0.5%–1.5%, 1.5%–2%, 2%–5%, 5%–10%, 10%–15% and 15%–40%) on the deviation of buccal or blood sample MAF from the CEU HapMap standard. MAF categories were established based on the CEU HapMap reference, with MAF >40% excluded from analysis to minimize the likelihood that the minor allele in HapMap would be the major allele in our population.