Back to Search Start Over

Performance analysis of cost-sensitive learning methods with application to imbalanced medical data

Authors :
Ibomoiye Domor Mienye
Yanxia Sun
Source :
Informatics in Medicine Unlocked, Vol 25, Iss, Pp 100690-(2021)
Publication Year :
2021
Publisher :
Elsevier, 2021.

Abstract

Many real-world machine learning applications require building models using highly imbalanced datasets. Usually, in medical datasets, the healthy patients or samples are dominant, making them the majority class, while the sick patients are few, making them the minority class. Researchers have proposed numerous machine learning methods to predict medical diagnosis. Still, the class imbalance problem makes it difficult for classifiers to adequately learn and distinguish between the minority and majority classes. Cost-sensitive learning and resampling techniques are used to deal with the class imbalance problem. This research focuses on developing robust cost-sensitive classifiers by modifying the objective functions of some well-known algorithms, such as logistic regression, decision tree, extreme gradient boosting, and random forest, which are then used to efficiently predict medical diagnosis. Meanwhile, as opposed to resampling techniques, our approach does not alter the original data distribution. Firstly, we implement the standard versions of these algorithms to provide a baseline for performance comparison. Secondly, we develop their corresponding cost-sensitive algorithms. For the proposed approaches, it is not necessary to change the distribution of the original data as the modified algorithms consider the imbalanced class distribution during training, thereby resulting in more reliable performance than when the data is resampled. Four popular medical datasets, including the Pima Indians Diabetes, Haberman Breast Cancer, Cervical Cancer Risk Factors, and Chronic Kidney Disease datasets, are used in the experiments to validate the performance of the proposed approach. The experimental results show that the cost-sensitive methods yield superior performance compared to the standard algorithms.

Details

Language :
English
ISSN :
23529148
Volume :
25
Database :
OpenAIRE
Journal :
Informatics in Medicine Unlocked
Accession number :
edsair.doi.dedup.....7b73b64a33a5753582a687ab7302af93