Back to Search
Start Over
Surrogate minimal depth as an importance measure for variables in random forests
- Source :
- Bioinformatics
- Publication Year :
- 2019
- Publisher :
- Oxford University Press (OUP), 2019.
-
Abstract
- Motivation It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult. Results Here we propose a new variable selection approach called surrogate minimal depth (SMD) that incorporates surrogate variables into the concept of minimal depth (MD) variable importance. Applying SMD, we show that simulated correlation patterns can be reconstructed and that the increased consideration of variable relationships improves variable selection. When compared with existing state-of-the-art methods and MD, SMD has higher empirical power to identify causal variables while the resulting variable lists are equally stable. In conclusion, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcome in a high-dimensional data setting. Availability and implementation https://github.com/StephanSeifert/SurrogateMinimalDepth. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Computer science
Gene Expression
Feature selection
Predictor variables
Machine learning
computer.software_genre
01 natural sciences
Biochemistry
Machine Learning
010104 statistics & probability
03 medical and health sciences
0101 mathematics
Molecular Biology
030304 developmental biology
0303 health sciences
Measure (data warehouse)
business.industry
Original Papers
Outcome (probability)
Regression
Computer Science Applications
Random forest
Computational Mathematics
Variable (computer science)
Computational Theory and Mathematics
Artificial intelligence
business
computer
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 35
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....256c1e5cc06dbe692a386f517aefb9b4