Feature Selection and Hybrid Sampling with Machine Learning Methods for Health Data Classification.

Authors :: Hairani, Hairani
Widiyaningtyas, Triyanna
Prasetya, Didik Dwi
Source :: Revue d'Intelligence Artificielle; Aug2024, Vol. 38 Issue 4, p1255-1261, 7p
Publication Year :: 2024
Abstract: This study aims to improve the performance of classification algorithms in dealing with unbalanced and high-dimensional health in stroke prediction by integrating correlation feature selection and hybrid sampling techniques. Several previous studies that used machine learning methods to predict stroke still had less than optimal accuracy. This is because stroke data has several problems, including missing values, many attributes, and data imbalance can cause a decrease in the performance of the classification method. Therefore, this research uses an integrated approach to feature selection and hybrid sampling. The objective of the feature selection technique is to identify important attributes within stroke data. After that, the SMOTE-Enn hybrid sampling approach is utilized to address data imbalance. The research findings indicate that employing correlation-based feature selection along with SMOTE-Enn and the Random Forest algorithm leads to improved performance compared to no sampling with the SVM and XGBoost methods, with an increase in accuracy of 3%, recall of 91.3%, and AUC of 45.2%. Thus, the proposed method performed better than recent stroke classification studies. [ABSTRACT FROM AUTHOR]

Subjects :: FEATURE selection
RANDOM forest algorithms
MACHINE learning
SAMPLING (Process)
CLASSIFICATION

Full Text Access

Tools