Back to Search Start Over

Classification Trees for Imbalanced Data: Surface-to-Volume Regularization.

Authors :
Zhu, Yichen
Li, Cheng
Dunson, David B.
Source :
Journal of the American Statistical Association; Sep2023, Vol. 118 Issue 543, p1707-1717, 11p
Publication Year :
2023

Abstract

Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation consistency for SVR-Tree and rate of convergence for an idealized empirical risk minimizer of SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications. for this article are available online. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01621459
Volume :
118
Issue :
543
Database :
Complementary Index
Journal :
Journal of the American Statistical Association
Publication Type :
Academic Journal
Accession number :
172404588
Full Text :
https://doi.org/10.1080/01621459.2021.2005609