Back to Search Start Over

Complete Random Forest Based Class Noise Filtering Learning for Improving the Generalizability of Classifiers.

Authors :
Xia, Shuyin
Wang, Guoyin
Chen, Zizhong
Duan, Yanlin
liu, Qun
Source :
IEEE Transactions on Knowledge & Data Engineering; Nov2019, Vol. 31 Issue 11, p2063-2078, 16p
Publication Year :
2019

Abstract

The existing noise detection methods required the classifiers or distance measurements or data overall distribution, and 'curse of dimensionality' and other restrictions made them insufficiently effective in complex data, e.g., different attribute weights, high-dimensionality, containing feature noise, nonlinearity, etc. This is also the main reason that the existing noise filtering methods were not widely applied and formed an effective learning framework. To address this problem, we propose here a complete and efficient random forest method (CRF) specifically for the class noise detection by simulating the grid generation and expansion. The CRF is not based on distance measures or overall distribution or classifiers; besides, the voting mechanism makes it able to effectively process datasets containing feature noise. Furthermore, we introduce CRF based class noise filtering learning framework (CRF-NFL) and derive its mathematical model. The framework is then applied to many widely used classifiers including some state-of-the-art algorithms, e.g., k-means tree, GBDT, and XGBoost. Moreover, its parallelized is designed for large-scale data. The CRF-NFL show much better generalizability than the conventional classifiers and the relative density-based method, which is the most effective noise filtering method as far as we know. All research has formed an open source library, called CRF-NFL: http://www.cquptshuyinxia.com/CRF-NFL.html. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
31
Issue :
11
Database :
Complementary Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
139076918
Full Text :
https://doi.org/10.1109/TKDE.2018.2873791