Back to Search Start Over

A Robust Classifier for Imbalanced Datasets

Authors :
Kotagiri Ramamohanarao
Sori Kang
Source :
Advances in Knowledge Discovery and Data Mining ISBN: 9783319066073, PAKDD (1)
Publication Year :
2014
Publisher :
Springer International Publishing, 2014.

Abstract

Imbalanced dataset classification is a challenging problem, since many classifiers are sensitive to class distribution so that the classifiers’ prediction has bias towards majority class. Hellinger Distance has been proven that it is skew-insensitive and the decision trees that employ Hellinger Distance as a splitting criterion have shown better performance than other decision trees based on Information Gain. We propose a new decision tree induction classifier (HeDEx) based on Hellinger Distance that is randomized ensemble trees selecting both attribute and split-point at random. We also propose hyperplane as a decision surface for HeDEx to improve the performance. A new pattern-based oversampling method is also proposed in this paper to reduce the bias towards majority class. The patterns are detected from HeDEx and the new instances generated are applied after verification process using Hellinger Distance Decision Trees. Our experiments show that the proposed methods show performance improvements on imbalanced datasets over the state-of-the-art Hellinger Distance Decision Trees.

Details

ISBN :
978-3-319-06607-3
ISBNs :
9783319066073
Database :
OpenAIRE
Journal :
Advances in Knowledge Discovery and Data Mining ISBN: 9783319066073, PAKDD (1)
Accession number :
edsair.doi...........df8b62f3fbd34731edc1c158032a1462
Full Text :
https://doi.org/10.1007/978-3-319-06608-0_18