Back to Search
Start Over
A Robust Classifier for Imbalanced Datasets
- Source :
- Advances in Knowledge Discovery and Data Mining ISBN: 9783319066073, PAKDD (1)
- Publication Year :
- 2014
- Publisher :
- Springer International Publishing, 2014.
-
Abstract
- Imbalanced dataset classification is a challenging problem, since many classifiers are sensitive to class distribution so that the classifiers’ prediction has bias towards majority class. Hellinger Distance has been proven that it is skew-insensitive and the decision trees that employ Hellinger Distance as a splitting criterion have shown better performance than other decision trees based on Information Gain. We propose a new decision tree induction classifier (HeDEx) based on Hellinger Distance that is randomized ensemble trees selecting both attribute and split-point at random. We also propose hyperplane as a decision surface for HeDEx to improve the performance. A new pattern-based oversampling method is also proposed in this paper to reduce the bias towards majority class. The patterns are detected from HeDEx and the new instances generated are applied after verification process using Hellinger Distance Decision Trees. Our experiments show that the proposed methods show performance improvements on imbalanced datasets over the state-of-the-art Hellinger Distance Decision Trees.
Details
- ISBN :
- 978-3-319-06607-3
- ISBNs :
- 9783319066073
- Database :
- OpenAIRE
- Journal :
- Advances in Knowledge Discovery and Data Mining ISBN: 9783319066073, PAKDD (1)
- Accession number :
- edsair.doi...........df8b62f3fbd34731edc1c158032a1462
- Full Text :
- https://doi.org/10.1007/978-3-319-06608-0_18