Back to Search Start Over

Ad-RuLer: A novel rule-driven data synthesis technique for imbalanced classification

Authors :
Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group
Zhang, Xiao
Paz Ortiz, Alejandro Iván
Nebot Castells, M. Àngela
Múgica Álvarez, Francisco
Romero Merino, Enrique
Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group
Zhang, Xiao
Paz Ortiz, Alejandro Iván
Nebot Castells, M. Àngela
Múgica Álvarez, Francisco
Romero Merino, Enrique
Publication Year :
2023

Abstract

When classifiers face imbalanced class distributions, they often misclassify minority class samples, consequently diminishing the predictive performance of machine learning models. Existing oversampling techniques predominantly rely on the selection of neighboring data via interpolation, with less emphasis on uncovering the intrinsic patterns and relationships within the data. In this research, we present the usefulness of an algorithm named RuLer to deal with the problem of classification with imbalanced data. RuLer is a learning algorithm initially designed to recognize new sound patterns within the context of the performative artistic practice known as live coding. This paper demonstrates that this algorithm, once adapted (Ad-RuLer), has great potential to address the problem of oversampling imbalanced data. An extensive comparison with other mainstream oversampling algorithms (SMOTE, ADASYN, Tomek-links, Borderline-SMOTE, and KmeansSMOTE), using different classifiers (logistic regression, random forest, and XGBoost) is performed on several real-world datasets with different degrees of data imbalance. The experiment results indicate that Ad-RuLer serves as an effective oversampling technique with extensive applicability.<br />Peer Reviewed<br />Postprint (published version)

Details

Database :
OAIster
Notes :
22 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1417305011
Document Type :
Electronic Resource