Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing

Authors :: Pritam Chakraborty
Anjan Bandyopadhyay
Preeti Padma Sahu
Aniket Burman
Saurav Mallik
Najah Alsubaie
Mohamed Abbas
Mohammed S. Alqahtani
Ben Othman Soufiene
Source :: BMC Bioinformatics, Vol 25, Iss 1, Pp 1-23 (2024)
Publication Year :: 2024
Publisher :: BMC, 2024.
Abstract: Abstract Stroke prediction remains a critical area of research in healthcare, aiming to enhance early intervention and patient care strategies. This study investigates the efficacy of machine learning techniques, particularly principal component analysis (PCA) and a stacking ensemble method, for predicting stroke occurrences based on demographic, clinical, and lifestyle factors. We systematically varied PCA components and implemented a stacking model comprising random forest, decision tree, and K-nearest neighbors (KNN).Our findings demonstrate that setting PCA components to 16 optimally enhanced predictive accuracy, achieving a remarkable 98.6% accuracy in stroke prediction. Evaluation metrics underscored the robustness of our approach in handling class imbalance and improving model performance, also comparative analyses against traditional machine learning algorithms such as SVM, logistic regression, and Naive Bayes highlighted the superiority of our proposed method.

Subjects :: Stroke prediction
Machine learning
Principal component analysis (PCA)
Stacking ensemble
Healthcare analytics
Predictive modeling
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5

Full Text Access

Tools