1. Blending Shapley values for feature ranking in machine learning: an analysis on educational data.
- Author
-
Guleria, Pratiyush
- Subjects
- *
DATA mining , *MACHINE learning , *FEATURE selection , *SUPPORT vector machines , *K-nearest neighbor classification , *MACHINE theory - Abstract
In educational institutions, it is now more important than ever to deliver high-quality academic instruction, and educational data mining is essential for resolving problems that arise from challenging unstructured data in this field. Using machine learning (ML) approaches, the performance of students and traits related to academia, a crucial indicator of higher education, is examined. In the proposed study, the educational dataset is subjected to feature ranking algorithms, including MRMR, ReliefF, Chi-Square, ANOVA, and Kruskal–Wallis, followed by important feature selection using Shapley. The dataset has 16 attributes of integer, categorical type, and after feature ranking approaches, the features with the most important information are chosen and ML techniques are applied to them. It takes two phases to complete the work. The results are obtained after the first phase, in which all features are taken into account for ML training. The second phase of ML training takes into account selective features that are derived using ranking approaches. ML models with only selective attributes are compared to models with all features in order to determine which is more precise. In comparison, the results of the ML models with selective attributes outperformed the models with all attributes. Overall, the ensemblers, i.e., bagged tree and AdaBoost, outperformed other ML techniques such as decision trees, neural networks, naive Bayes, K-nearest neighbor, and support vector machines presented in the proposed study. Bagged trees achieved an accuracy of 81.0 percent, while AdaBoost achieved an accuracy of 74.2 percent. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF