We used information from the 1958 British birth cohort (1958BC, up to n = 5,231) as the discovery sample, and from the Northern Finland Birth cohort 1966 (NFBC66, up to n = 5,316) and Twins UK (up to n = 3,943)] to replicate the initial findings from the 1958BC.
Detailed description of the 1958 British birth cohort (1958BC) has been published previously . In brief, study participants were born in England, Scotland or Wales during one week in March 1958 (n = 17,638). At age 45 years, 11,971 participants were invited to attend a biomedical survey: 9,377 (78%) completed at least one questionnaire. The 1958BC is almost entirely a white European population (98%) , and for these analyses, 158 individuals of other ethnic groups and one pregnant participant were excluded. The 45-year biomedical survey was approved by the South-East Multi-Centre Research Ethics Committee (ref. 01/1/44), the ethics approval for genetic work was granted by the Joint UCL/UCLH Committees on the Ethics of Human Research (Committee A) Ref: 08/H0714/40, and written consent [for use of information in medical research studies] was obtained from the participants. For the present study, all the analyses were performed in up to 5,231 individuals.
The Northern Finland Birth Cohort of 1966 (NFBC66) comprises a total of 12,058 live-births to mothers living in the two northern‒most provinces of Finland, who were invited to participate if they had expected delivery dates during 1966 . At age 31 all individuals still living in Northern Finland or the Helsinki area were asked to participate in a detailed biological and medical examination (n = 6,007) as well as a questionnaire. The University of Oulu ethics committee approved the study. The present study includes up to 5,316 individuals with genotype data and information on WHR, serum triglycerides and LDL cholesterol. Written informed consent was obtained from all the participants and the Ethics Committee of the Faculty of Medicine at the University of Oulu approved the study.
The Twins registry in St. Thomas' Hospital, King's College London recruited a total sample of 11,000 identical and non-identical, mostly female Caucasian, twins from across the UK through national media campaigns . Their age ranges between 16 and 85 years. Over 7,000 twins have attended detailed clinical examinations with a wide range of phenotypes over the last 18 years. All participants were recruited without presence or interest in any particular disease or trait. We included individuals for whom data on WHR (n = 3,943), serum triglycerides (n = 1,996) or LDL cholesterol (n = 1,992) were available. The Guy’s and St Thomas’ (GSTT) Ethics Committee approved the study and all the study participants gave informed consent.
Weight and standing height, at 45 years of age, were measured without shoes and in light clothing by a trained nurse using standardized protocols and equipment; waist circumference was measured by the nurse midway between the costal margin and iliac crest. BMI was calculated as weight (kg)/height (m)2. Blood pressure was measured in a seated position, after 5 min rest, using an Omron 705CP automated sphygmomanometer with a large cuff for participants with a mid-upper arm circumference ≥32 cm; the measurement was repeated three times, and blood pressure was determined as the average of successful measurements.
Venous blood samples were drawn without prior fasting and posted to the collaborating laboratory. Glycosylated haemoglobin (HbA1c) was assayed using high-performance liquid chromatography standardized to the Diabetes Control and Complications Trial . Triglycerides and HDL cholesterols were measured by standard autoanalyzer methodology.
Height and body weight were measured using a standardized height measure and scale. The participants were asked to fast overnight before a blood sample was taken. Serum HDL cholesterol and triglycerides were determined by enzymatic methods using a Hitachi 911 Clinical Chemistry Analyzer (Boehringer Mannheim). Serum LDL was calculated by the Friedewald formula if the serum TG level was <354 mg/dl; if the triglyceride level was <354 mg/dl, LDL was determined by precipitating LD-lipoproteins with heparin and measuring cholesterol in the liquid phase and subtracting it from total cholesterol.
Weight and standing height were measured without shoes and in light clothing by a trained nurse. Blood sample collection for determination of fasting lipids was drawn from most subjects after a minimum 8-h overnight fast. Serum was stored at −45°C until analyzed using a Cobas Fara machine (Roche Diagnostics, Lewes, UK). A colourimetric enzymatic method was used to determine total cholesterol, triglycerides and HDL cholesterol levels. The latter was measured after precipitation from chylomicron, LDL and VLDL particles by magnesium and dextran sulphate.
Tag SNP selection
Tag SNPs for VDR and RXRG genes were chosen using the genotype data from the International HapMap collected in individuals of Northern and Western European ancestry (CEU) (HapMap data release 24/ phase II Nov08, on NCBI B36 assembly, dbSNP b126). The Haploview software V3.3 (http://www.broadinstitute.org/haploview/haploview-downloads) was used to assess the linkage disequilibrium (LD) structure between SNPs . Tagger software was used to select tag SNPs with the ‘pairwise tagging only’ option and an r2 threshold of >0.8 (±10 kb upstream and downstream of the genes). In the tag SNP selection, we force included the functional SNPs (VDR SNPs: rs731236 and rs2228570; RXRG SNPs: rs2134095) previously studied [15–18] before running tagger. There were 30 VDR and 31 RXRG tag SNPs; however, after applying the quality control criteria [call rate >99% for genotyped SNPs, average genotype probability across all individuals in the sample >90% for imputed SNPs and minor allele frequency >5%], there were only 22 VDR and 23 RXRG tag SNPs.
Genome-wide data for the 1958BC were obtained through two sub-studies, both using the 1958BC participants as population controls. The first sub-study included 3000 DNA samples randomly selected as part of the Welcome Trust Case Control Consortium (WTCCC2) and genotyped on the Affymetrix 6.0 platform . The second sub-study was the Type 1 diabetes case–control study (T1DGC) which used 2,500 DNA samples and genotyped using the Illumina Infinium 550 K chip through the JDRF/WT Diabetes and Inflammation Laboratory (DIL) . IMPUTE was used for the imputations that were done in the 1958BC.
For NFBC, genomic DNA was extracted from whole blood using standard methods. All DNA samples for the Illumina Infinium 370cnvDuo array were prepared for genotyping by the Broad Institute Biological Sample Repository (BSP). The 1000 Genome imputation was carried out for the NFBC66 samples using IMPUTE2.
Genotyping of the TwinsUK dataset was done with a combination of Illumina arrays (HumanHap300, HumanHap610Q, 1 M-Duo and 1.2 M Duo 1 M). The normalised intensity data for each of the three arrays were pooled separately (with 1 M-Duo and 1.2 M Duo 1 M pooled together). For each dataset, the Illluminus calling algorithm was used to assign genotypes in the pooled data. No calls were assigned if an individual's most likely genotype was called with less than a posterior probability threshold of 0.95. Prior to merging, pairwise comparison was performed among the three datasets. Further exclusion of SNPs and samples was done to avoid spurious genotyping effects, identified as follows: (i) concordance at duplicate samples <1% (i.e., only samples with ≥99% concordance included for the study); (ii) concordance at duplicate SNPs <1% (i.e., only SNPs with ≥99% concordance included for the analysis); (iii) visual inspection of QQ plots for logistic regression applied to all pairwise dataset comparisons; (iv) Hardy-Weinberg p-value <10−6, assessed in a set of unrelated samples; (v) observed pairwise IBD probabilities (samples excluded if the IBD threshold was less than 0.30) suggestive of sample identity errors.
The natural logarithm was used to transform slightly skewed metabolic measures (BMI, WC, WHR, HbA1c and serum triglycerides) to approximate a normal distribution. All the SNPs were coded additively and with the effect allele as the minor allele. Linear regression models were used to evaluate the interaction between the VDR and RXRG tag SNPs on the following outcomes: BMI, WC, WHR, SBP, DBP, HDL and LDL cholesterol, serum triglycerides and HbA1c. The Friedewald equation was used to calculate LDL cholesterol levels in subjects with triglycerides ≤4.52 mmol/L . In the 1958BC, linear regression models were adjusted for gender, geographical region (coded as Scotland, North, Middle, and South of England including Wales, and London) and genotyping platform. The use of medication was adjusted for in the models of HbA1c, serum triglycerides, LDL and HDL cholesterols. Serum triglyceride measures were further adjusted for time since eating prior to blood sample. Blood pressure of individuals who were on blood pressure medication was adjusted by adding 15 mm Hg to SBP and 10 mm Hg to the DBP . Models for WHR were adjusted for BMI to test whether the effects of the SNP-SNP interactions on WHR are independent of BMI.
A joint likelihood ratio test (LRT) of the main SNP effects and the SNP-SNP interaction effects was used in the linear regression analyses to maximise statistical power (H0: βS1 = βS2 = βS1xS2 = 0) . In comparison to the joint LRT of the main and the interaction effects, we also performed direct LRT tests for interaction (one degree of freedom test, H0: βS1xS2 = 0). This was done by comparing the model with the SNP-SNP interaction term and the marginal effects of both SNPs, with a model including the marginal effects of both SNPs only. Bonferroni correction was applied to p-values in order to account for multiple testing (22 × 23 = 506 SNP-SNP combinations assessed). Combinations with a corrected p-value <0.05 (uncorrected P < 0.05/506 = 9.9 × 10−5) were selected for replication.
At the discovery stage, we also used Multifactor Dimensionality Reduction (MDR) program (version 3.0.3) [24, 25] as a non-parametric test to scan for potential interactions (one to four way combinations) between the VDR and RXRG tag SNPs on all the metabolic traits in the 1958BC. MDR program is a genetic model free approach [24, 25], and includes a combined cross-validation and permutation testing procedure. With 10-fold cross-validation, the data are divided into 10 equal parts, and the model is developed on 9/10 of the data (training set) and then tested on 1/10 of the remaining data (testing set). The cross-validation consistency was done as a measure of how many times out of 10 divisions of the data MDR finds the same best model; hence, the higher the consistency, the better the model. Permutation testing was performed to assess the probability of obtaining a testing accuracy as large as or larger than that observed in the original data, given that the null hypothesis of no association is true. This is carried out by randomizing the samples 1000 times and repeating the MDR analysis on each randomized dataset. This process yields an empirical distribution of testing accuracies under the null hypothesis, which is in turn used to calculate a p-value. For the MDR analysis in the 1958BC (up to 5,231 individuals), trait values were standardised for covariates using the same adjustments as in linear regression analyses (see above), as MDR analysis does not take account of these covariates. This was done by regressing the relevant covariates on the trait and using the standardised residuals as the new “trait” outcome variables for use in the MDR analyses.
For replication analyses, a one degree of freedom test for the interaction term was used. In the NFBC66, models were adjusted for gender, population substructure (using the first two principal components) and the use of lipid lowering medications, as described above. Available covariates in Twins UK were gender and age. The cluster function for familial relatedness was used to account for non-independence of the twin pairs in analyses for Twins UK. Results from the two replication cohorts (NFBC66 and Twins UK) were then meta-analysed using the inverse-variance method for a fixed effects model. Bonferroni correction was also applied in order to account for multiple testing [P < 0.05/8 for 8 combinations assessed]. All analyses were carried out using STATA, version 12, except for the analyses in Twins UK, where STATA, version 10 was used.
The power to detect SNP-SNP interactions in the 1958BC (n =5,231) was calculated using the Quanto software (version 1.2.4). The power to detect SNP-SNP interactions for a standard normal outcome was calculated with different combinations of minor allele frequency (MAF) and an interaction beta of up to 0.25 (marginal SNP effects were set to 0.02). There was 80% power to detect an interaction β as small as 0.08 when both SNPs had a MAF of 0.40. However, when both SNPs had a MAF of 0.1, there was 80% power to detect interaction effect sizes only as small as 0.22 (Additional file 1: Figure S1). Despite the large sample size in the 1958BC, we lack the power to detect smaller interactions, particularly when looking for combinations of SNPs with lower MAF.