1. FIUS: Fixed partitioning undersampling method
- Author
-
M. I. M. Wahab, Azam Dekamin, Karim Keshavjee, and Aziz Guergachi
- Subjects
Canada ,Computer science ,Clinical Biochemistry ,02 engineering and technology ,Logistic regression ,computer.software_genre ,Biochemistry ,Field (computer science) ,Regular grid ,03 medical and health sciences ,0302 clinical medicine ,Classifier (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,030212 general & internal medicine ,Biochemistry (medical) ,General Medicine ,Grid ,Class (biology) ,3. Good health ,Logistic Models ,Diabetes Mellitus, Type 2 ,Research Design ,Undersampling ,Pre diabetes ,020201 artificial intelligence & image processing ,Data mining ,computer ,Algorithms - Abstract
Background and Objective In the medical field, data techniques for prediction and finding patterns of prevalent diseases are of increasing interest. Classification is one of the methods used to provide insight into predicting the future onset of type 2 diabetes of those at high risk of progression from pre-diabetes to diabetes. When applying classification techniques to real-world datasets, imbalanced class distribution has been one of the most significant limitations that leads to patients’ misclassification. In this paper, we propose a novel balancing method to improve the prediction performance of type 2 diabetes mellitus in imbalanced electronic medical records (EMR). Methods A novel undersampling method is proposed by utilizing a fixed partitioning distribution scheme in a regular grid. The proposed approach retains valuable information when balancing methods are applied to datasets. Results The best AUC of 80% compared to other classifiers was obtained from the logistic regression (LR) classifier for EMR by applying our proposed undersampling method to balance the data. The new method improved the performance of the LR classifier compared to existing undersampling methods used in the balancing stage. Conclusion The results demonstrate the effectiveness and high performance of the proposed method for predicting diabetes in a Canadian imbalanced dataset. Our methodology can be used in other areas to overcome the limitations of imbalanced class distributions.
- Published
- 2021