Back to Search
Start Over
Noise-adaptive synthetic oversampling technique
- Source :
- Applied Intelligence. 51:7827-7836
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- In the field of supervised learning, the problem of class imbalance is one of the most difficult problems, and has attracted a great deal of research attention in recent years. In an imbalanced dataset, minority classes are those that contain very small numbers of data samples, while the remaining classes have a very large number of data samples. This type of imbalance reduces the predictive performance of machine learning models. There are currently three approaches for dealing with the class imbalance problem: algorithm-level, data-level, and ensemble-based approaches. Of these, data-level approaches are the most widely used, and consist of three sub-categories: under-sampling, oversampling, and hybrid techniques. Oversampling techniques generate synthetic samples for the minority class to balance an imbalanced dataset. However, existing oversampling approaches do not have a strategy for handling noise samples in imbalanced and noisy datasets, which leads to a reduction in the predictive performance of machine learning models. This study therefore proposes a noise-adaptive synthetic oversampling technique (NASOTECH) to deal with the class imbalance problem in imbalanced and noisy datasets. The noise-adaptive synthetic oversampling (NASO) strategy is first introduced, which is used to identify the number of samples generated for each sample in the minority class, based on the concept of the noise ratio. Next, the NASOTECH algorithm is proposed, based on the NASO strategy, to handle the class imbalance problem in imbalanced and noisy datasets. Finally, empirical experiments are conducted on several synthetic and real datasets to verify the effectiveness of the proposed approach. The experimental results confirm that NASOTECH outperforms three state-of-the-art oversampling techniques in terms of accuracy and geometric mean (G-mean) on imbalanced and noisy datasets.
- Subjects :
- Computer science
business.industry
Supervised learning
Large numbers
Sample (statistics)
02 engineering and technology
Minority class
Machine learning
computer.software_genre
Field (computer science)
Reduction (complexity)
ComputingMethodologies_PATTERNRECOGNITION
Artificial Intelligence
0202 electrical engineering, electronic engineering, information engineering
Oversampling
020201 artificial intelligence & image processing
Noise (video)
Artificial intelligence
business
computer
Subjects
Details
- ISSN :
- 15737497 and 0924669X
- Volume :
- 51
- Database :
- OpenAIRE
- Journal :
- Applied Intelligence
- Accession number :
- edsair.doi...........37557cbbc2e553f81efcf0775528ea24
- Full Text :
- https://doi.org/10.1007/s10489-021-02341-2