Data
The GAW20 data sets are drawn from the GOLDN study with a total number of 1105 participants [2]. The data sets include GWAS and EWAS data before and after the fenofibrate (blood lipid-lowering drug) intervention. The EWAS data set contains 2 triglyceride (TG) measurements and methylation levels of 463,995 cytosine-phosphate-guanine (CpG) sites for 995 pretreatment individuals and 530 posttreatment individuals, respectively. The log-transformed mean pre- and posttreatment TG levels were used as the outcome variable in our model. Control variables include age, gender, study center, and family pedigree.
EWAS model
We applied mixed-effects models for two repeated measures of log TG levels with fixed effects of time (0 = pre, 1 = post), methylation level, and their interactions, adjusting for age (18 years of age to approximately 87 years of age), sex, study site, and top 4 methylation principal components. Pedigree and subject IDs are controlled as nested random effects. These fixed effects of time, methylation levels, and the interaction term, measure the associations for both pre- and posttreatment periods, and the treatment modification effects, respectively.
Let Yijk denote the log TG measurements at kth time (0 = pre, 1 = post) for the ith individual in the jth pedigree; Xijk denote the methylation level; and tk denote treatment while t0= 0 and t1=1. The model equation can be written as:
$$ {\displaystyle \begin{array}{l}{Y}_{ij k}={\beta}_0+{\beta}_1{X}_{ij k}+\gamma {t}_k+\delta \left({X}_{ij k}{t}_k\right)+{\beta}_2{Age}_i+{\beta}_3{Site}_i\\ {}+{\beta}_4 PC{1}_{ik}+{\beta}_5 PC{2}_{ik}+{\beta}_6 PC{3}_{ik}+{\beta}_7 PC{4}_{ik}\\ {}+{S}_{ij}+{\varepsilon}_{ij k}\end{array}} $$
where the main effect β1is the pretreatment methylation effect on log TG; γ is the main treatment effect; δ is the interaction effect between methylation and treatment (i.e., the treatment modification effect); and Sij is the random effect of the individual nested within the pedigree. The general linear hypothesis tests were applied to calculate postmethylation effect (β1 + δ), the standard errors, and the p values. We examined each CpG site on the whole genome (463,995 sites). Mixed-effects models for repeated measures enable us to examine the individual patterns of change by excluding between-individual variability and provide more efficient estimators of treatment effects. The main effects and interactions work together to identify the epigenetic risk factors of TG levels for pretreatment, posttreatment, and potential gene–drug interactions simultaneously [3, 4]. Compared to cross-sectional study, the repeated measure analysis has the advantage of making reliable inferences by capturing the systemic changes within individuals, thereby achieving more sensitive tests and higher statistical power for a fixed number of individuals [5, 6]. Statistical software R (version 3.2.3) was used for the entire analyses, with R package nlme for mixed-effects modeling [7], car for linear hypothesis tests [8], and qqman for Manhattan plots [9].We applied a relatively loose significance threshold (p value <1E-5) for modification effects and posttreatment associations because of the exploratory nature of proposed method and the moderate sample size (N = 536 posttreatment measures). A less-stringent threshold might imply potential drug modification effects, as empirical evaluation suggests a possible relaxation in the current GWAS threshold for replication studies [10].
Pathway-enrichment analysis
After EWAS analyses of CpG sites for pretreatment, posttreatment, and interaction effects, we mapped them to corresponding genes. To provide a functional insight of the results, we applied a gene set enrichment analysis (GSEA) [11] preranked test to each of 3 gene lists with log-transformed p values. To compute the empirical p values and false discovery rates (FDRs) for pathways, we performed 1000 permutations. Pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [12] were used in our analysis.
GSEA is a robust technique that searches for pathways (gene sets) that contain abundant highly significant genes (CpG sites) based on a Kolmogorov-Smirnov test [11] to reveal biological insights of genome/epigenome data.