Back to Search Start Over

SMOTE-RS B: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory.

Authors :
Ramentol, Enislay
Caballero, Yailé
Bello, Rafael
Herrera, Francisco
Source :
Knowledge & Information Systems; Nov2012, Vol. 33 Issue 2, p245-265, 21p, 1 Diagram, 9 Charts, 2 Graphs
Publication Year :
2012

Abstract

Imbalanced data is a common problem in classification. This phenomenon is growing in importance since it appears in most real domains. It has special relevance to highly imbalanced data-sets (when the ratio between classes is high). Many techniques have been developed to tackle the problem of imbalanced training sets in supervised learning. Such techniques have been divided into two large groups: those at the algorithm level and those at the data level. Data level groups that have been emphasized are those that try to balance the training sets by reducing the larger class through the elimination of samples or increasing the smaller one by constructing new samples, known as undersampling and oversampling, respectively. This paper proposes a new hybrid method for preprocessing imbalanced data-sets through the construction of new samples, using the Synthetic Minority Oversampling Technique together with the application of an editing technique based on the Rough Set Theory and the lower approximation of a subset. The proposed method has been validated by an experimental study showing good results using C4.5 as the learning algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02191377
Volume :
33
Issue :
2
Database :
Complementary Index
Journal :
Knowledge & Information Systems
Publication Type :
Academic Journal
Accession number :
82504772
Full Text :
https://doi.org/10.1007/s10115-011-0465-6