Back to Search Start Over

Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?

Authors :
Hussin Adam Khatir, Ahmed Almustfa
Bee, Marco
Source :
Risks; Sep2022, Vol. 10 Issue 9, pN.PAG-N.PAG, 22p
Publication Year :
2022

Abstract

Forecasting the creditworthiness of customers is a central issue of banking activity. This task requires the analysis of large datasets with many variables, for which machine learning algorithms and feature selection techniques are a crucial tool. Moreover, the percentages of "good" and "bad" customers are typically imbalanced such that over- and undersampling techniques should be employed. In the literature, most investigations tackle these three issues individually. Since there is little evidence about their joint performance, in this paper, we try to fill this gap. We use five machine learning classifiers, and each of them is combined with different feature selection techniques and various data-balancing approaches. According to the empirical analysis of a retail credit bank dataset, we find that the best combination is given by random forests, random forest recursive feature elimination and random oversampling. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
22279091
Volume :
10
Issue :
9
Database :
Complementary Index
Journal :
Risks
Publication Type :
Academic Journal
Accession number :
159335225
Full Text :
https://doi.org/10.3390/risks10090169