1. A gradient boosting-based mortality prediction model for COVID-19 patients.
- Author
-
Keser, Sinem Bozkurt and Keskin, Kemal
- Subjects
- *
PREDICTION models , *PUBLIC health , *COVID-19 pandemic , *ARTIFICIAL intelligence , *MORTALITY , *COVID-19 - Abstract
The COVID-19 pandemic has been a global public health concern since March 11, 2020. Healthcare systems struggled to meet patients' growing needs for diagnosis, treatment, and care. As healthcare industries struggled to cope with the overwhelming demands, advanced intelligence and computing technologies have become essential. Artificial intelligence techniques have become essential for identifying and triaging patients, predicting disease severity, and detecting outcomes. The aim of the paper is to propose a gradient boosting-based model to predict the mortality of COVID-19 patients and to improve the prediction accuracy by incorporating resampling strategies. A real COVID-19 data that includes patients' travel, health, geographical, and demographic information is obtained from a public repository. The dataset used in the study has the class imbalance problem, and several approaches are applied to solve the problem. In this study, a gradient boosting-based model for predicting the mortality of COVID-19 patients is proposed. This approach incorporates resampling strategies, such as synthetic minority oversampling technique (SMOTE), random under-sampling, and clustering-based under-sampling, to address the imbalanced class distribution problem in the dataset. Then, gradient boosting machines (GBM) such as extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) are analyzed in terms of accuracy and computational time. Random search method is used to find the optimal hyper-parameters for the algorithms. A stacking-based hybrid model that combines the XGBoost, LightGBM, and CatBoost algorithms was used for comparison in the experiments. In the experiments, the factors that can influence the mortality of COVID-19 patients are investigated. And, it is found that the age of the patient, whether the patient belonged to Wuhan, the difference between when they first noticed symptoms and when they visited the hospital (in days) affect the mortality. By utilizing over/under-sampling approaches, we ameliorated the concern of class imbalance. XGBoost, LightGBM, and CatBoost are effectively analyzed in terms of various performance metrics to determine the suitable GBM for the proposed system. The experimental results revealed that the stacking-based hybrid model performs well with the balanced dataset provided by SMOTE. CatBoost produces superior results for a balanced dataset with random under-sampling and clustering-based under-sampling. The main focus of the study is to propose a gradient boosting-based model for predicting the mortality of COVID-19 patients. This study also emphasizes the importance of addressing the imbalanced class distribution problem in the dataset and incorporates resampling strategies to improve the prediction accuracy. Our promising result confirms the success of the proposed system in predicting mortality of COVID-19 disease. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF