Skip to main content

Table 2 Applied filters, possibility of adjustment for familial correlation, and strengths and limitations of applied methods in the contributions from the Data Mining and Machine Learning group at the GAW20

From: Data mining and machine learning approaches for the integration of genome-wide association and methylation data: methodology and main conclusions from GAW20

Contribution Applied filters Potential correlation adjustment Strengths Limitations
Random forest (Darst) None Yes Model free; adequate for high-dimensional data Does not work well with highly correlated variables
Deep learning (Islam) Methylation variability Yes Robust; adequate for high-dimensional data Difficult result interpretation, tough parameter set up, large sample sizes are needed
Cluster analysis (Kapusta) Reported genome-wide association studies on metabolic syndrome and fenofibrate treatment, principal component analysis, random forest Yes Intuitive cluster interpretation Previous dimension reduction can be indicated
Mixed models (Datta) Mixed models modification Yes Simple regression framework Not indicated for low-dimensional data
Gene-set enrichment (Piette) T-tests and linear regression No Circumvents multiple testing Requires biological insight