Skip to main content

Table 2 Applied filters, possibility of adjustment for familial correlation, and strengths and limitations of applied methods in the contributions from the Data Mining and Machine Learning group at the GAW20

From: Data mining and machine learning approaches for the integration of genome-wide association and methylation data: methodology and main conclusions from GAW20

Contribution

Applied filters

Potential correlation adjustment

Strengths

Limitations

Random forest (Darst)

None

Yes

Model free; adequate for high-dimensional data

Does not work well with highly correlated variables

Deep learning (Islam)

Methylation variability

Yes

Robust; adequate for high-dimensional data

Difficult result interpretation, tough parameter set up, large sample sizes are needed

Cluster analysis (Kapusta)

Reported genome-wide association studies on metabolic syndrome and fenofibrate treatment, principal component analysis, random forest

Yes

Intuitive cluster interpretation

Previous dimension reduction can be indicated

Mixed models (Datta)

Mixed models modification

Yes

Simple regression framework

Not indicated for low-dimensional data

Gene-set enrichment (Piette)

T-tests and linear regression

No

Circumvents multiple testing

Requires biological insight