Back to Search Start Over

Identifying Informative Predictor Variables with Random Forests

Authors :
Yannick Rothacher
Carolin Strobl
Source :
Journal of Educational and Behavioral Statistics. 2024 49(4):595-629.
Publication Year :
2024

Abstract

Random forests are a nonparametric machine learning method, which is currently gaining popularity in the behavioral sciences. Despite random forests' potential advantages over more conventional statistical methods, a remaining question is how reliably informative predictor variables can be identified by means of random forests. The present study aims at giving a comprehensible introduction to the topic of variable selection with random forests and providing an overview of the currently proposed selection methods. Using simulation studies, the variable selection methods are examined regarding their statistical properties, and comparisons between their performances and the performance of a conventional linear model are drawn. Advantages and disadvantages of the examined methods are discussed, and practical recommendations for the use of random forests for variable selection are given.

Details

Language :
English
ISSN :
1076-9986 and 1935-1054
Volume :
49
Issue :
4
Database :
ERIC
Journal :
Journal of Educational and Behavioral Statistics
Notes :
https://osf.io/5m946
Publication Type :
Academic Journal
Accession number :
EJ1434015
Document Type :
Journal Articles<br />Reports - Research
Full Text :
https://doi.org/10.3102/10769986231193327