Back to Search
Start Over
SMOTE-RS B: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory.
- Source :
- Knowledge & Information Systems; Nov2012, Vol. 33 Issue 2, p245-265, 21p, 1 Diagram, 9 Charts, 2 Graphs
- Publication Year :
- 2012
-
Abstract
- Imbalanced data is a common problem in classification. This phenomenon is growing in importance since it appears in most real domains. It has special relevance to highly imbalanced data-sets (when the ratio between classes is high). Many techniques have been developed to tackle the problem of imbalanced training sets in supervised learning. Such techniques have been divided into two large groups: those at the algorithm level and those at the data level. Data level groups that have been emphasized are those that try to balance the training sets by reducing the larger class through the elimination of samples or increasing the smaller one by constructing new samples, known as undersampling and oversampling, respectively. This paper proposes a new hybrid method for preprocessing imbalanced data-sets through the construction of new samples, using the Synthetic Minority Oversampling Technique together with the application of an editing technique based on the Rough Set Theory and the lower approximation of a subset. The proposed method has been validated by an experimental study showing good results using C4.5 as the learning algorithm. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02191377
- Volume :
- 33
- Issue :
- 2
- Database :
- Complementary Index
- Journal :
- Knowledge & Information Systems
- Publication Type :
- Academic Journal
- Accession number :
- 82504772
- Full Text :
- https://doi.org/10.1007/s10115-011-0465-6