Back to Search Start Over

Oversampling method based on GAN for tabular binary classification problems.

Authors :
Yang, Jie
Jiang, Zhenhao
Pan, Tingting
Chen, Yueqi
Pedrycz, Witold
Source :
Intelligent Data Analysis; 2023, Vol. 27 Issue 5, p1287-1308, 22p
Publication Year :
2023

Abstract

Data-imbalanced problems are present in many applications. A big gap in the number of samples in different classes induces classifiers to skew to the majority class and thus diminish the performance of learning and quality of obtained results. Most data level imbalanced learning approaches generate new samples only using the information associated with the minority samples through linearly generating or data distribution fitting. Different from these algorithms, we propose a novel oversampling method based on generative adversarial networks (GANs), named OS-GAN. In this method, GAN is assigned to learn the distribution characteristics of the minority class from some selected majority samples but not random noise. As a result, samples released by the trained generator carry information of both majority and minority classes. Furthermore, the central regularization makes the distribution of all synthetic samples not restricted to the domain of the minority class, which can improve the generalization of learning models or algorithms. Experimental results reported on 14 datasets and one high-dimensional dataset show that OS-GAN outperforms 14 commonly used resampling techniques in terms of G-mean, accuracy and F1-score. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1088467X
Volume :
27
Issue :
5
Database :
Complementary Index
Journal :
Intelligent Data Analysis
Publication Type :
Academic Journal
Accession number :
172806183
Full Text :
https://doi.org/10.3233/IDA-220383