Back to Search
Start Over
Using network features for credit scoring in microfinance
- Source :
- International Journal of Data Science and Analytics. 12:121-134
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers’ credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of experiments concerning feature selection, strategies to deal with imbalanced datasets and algorithm choice, to define a baseline model. This model is, then, compared to others adding network features to the original ones. For that comparison, we generate a network that links a given user to its phone book contacts which are users of a given mobile application, taking into account the ethics and privacy concerns involved, and use some feature extraction techniques, such as the introduction of centrality measures and the definition of node embeddings, in order to capture certain aspects of the network’s topology. Several node embedding algorithms are tested, but only Node2Vec proves to be significantly better than the baseline model, applying Friedman’s post hoc tests. This node embedding algorithm outperforms all the other, representing a relative improvement, in comparison with the baseline model, of 5.74% on the mean accuracy, 7.13% on the area under the Receiver Operating Characteristic curve and 30.83% on the Kolmogorov–Smirnov statistic scores. This method, therefore, proves to be very promising when trying to discriminate between “good” and “bad” customers, in credit scoring classification problems.
- Subjects :
- 0301 basic medicine
business.industry
Computer science
Applied Mathematics
Node (networking)
Feature extraction
Context (language use)
Feature selection
Machine learning
computer.software_genre
Computer Science Applications
Set (abstract data type)
03 medical and health sciences
030104 developmental biology
0302 clinical medicine
Computational Theory and Mathematics
Credit history
030220 oncology & carcinogenesis
Modeling and Simulation
Artificial intelligence
business
Centrality
computer
Statistic
Information Systems
Subjects
Details
- ISSN :
- 23644168 and 2364415X
- Volume :
- 12
- Database :
- OpenAIRE
- Journal :
- International Journal of Data Science and Analytics
- Accession number :
- edsair.doi...........7e4944d9ff8ca7df1d563d1a02094c5f