Back to Search
Start Over
Letter to the editor : on the term 'interaction' and related phrases in the literature on random forests
- Source :
- Briefings in Bioinformatics
- Publication Year :
- 2015
-
Abstract
- In the Life Sciences ‘omics’ data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.
- Subjects :
- Letter to the editor
Computer science
computer.software_genre
1710 Information Systems
Biological Science Disciplines
1312 Molecular Biology
Data Mining
Humans
Molecular Biology
Conditional dependence
Random Forest
Point (typography)
business.industry
10093 Institute of Psychology
proximity
Data science
conditional relationships
Term (time)
Random forest
Papers
variable importance
Artificial intelligence
local importance
variable interaction
business
150 Psychology
computer
Natural language processing
Algorithms
Information Systems
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Briefings in Bioinformatics
- Accession number :
- edsair.doi.dedup.....506294c609674a92bd0fb1c97c01e98a