Back to Search
Start Over
Noise-free sampling with majority framework for an imbalanced classification problem.
- Source :
- Knowledge & Information Systems; Jul2024, Vol. 66 Issue 7, p4011-4042, 32p
- Publication Year :
- 2024
-
Abstract
- Class imbalance has been widely accepted as a significant factor that negatively impacts a machine learning classifier's performance. One of the techniques to avoid this problem is to balance the data distribution by using sampling-based approaches, in which synthetic data is generated using the probability distribution of the classes. However, this process is sensitive to the presence of noise in the data, and the boundaries between the majority class and the minority class are blurred. Such phenomena shift the algorithm's decision boundary away from the ideal outcome. In this work, we propose a hybrid framework for two primary objectives. The first objective is to address class distribution imbalance by synthetically increasing the data of a minority class, and the second objective is, to devise an efficient noise reduction technique that improves the class balance algorithm. The proposed framework focuses on removing noisy elements from the majority class, and by doing so, provides more accurate information to the subsequent synthetic data generator algorithm. To evaluate the effectiveness of our framework, we employ the geometric mean (G-mean) as the evaluation metric. The experimental results show that our framework is capable of improving the prediction G-mean for eight classifiers across eleven datasets. The range of improvements varies from 7.78% on the Loan dataset to 67.45% on the Abalone19_vs_10-11-12-13 dataset. [ABSTRACT FROM AUTHOR]
- Subjects :
- DISTRIBUTION (Probability theory)
CLASSIFICATION
DATA distribution
Subjects
Details
- Language :
- English
- ISSN :
- 02191377
- Volume :
- 66
- Issue :
- 7
- Database :
- Complementary Index
- Journal :
- Knowledge & Information Systems
- Publication Type :
- Academic Journal
- Accession number :
- 178029260
- Full Text :
- https://doi.org/10.1007/s10115-024-02079-6