1. A tree approach for variable selection and its random forest.
- Author
-
Liu, Yu, Qin, Xu, and Cai, Zhibo
- Abstract
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its "sure screening property". Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers. The proposed approaches can be implemented using R package "SIStree" at https://github.com/liuyu-star/SIStree. • A model-free variable selection method with tree and forest structures is introduced with any dependence measure. • The proposed methods possess the sure screening property and can mitigate selecting unimportant variables. • A cross-validation procedure is proposed to determine the optimal cutoff for classifications and regressions. • The new method is compared with other variable selections through a variety of simulations and real data examples. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF