1. Super learner model for classifying leukemia through gene expression monitoring
- Author
-
Sharanya Selvaraj, Alhuseen Omar Alsayed, Nor Azman Ismail, Balasubramanian Prabhu Kavin, Edeh Michael Onyema, Gan Hong Seng, and Arinze Queen Uchechi
- Subjects
Gene expressions ,DNA microarray ,Super learner ,Random forest ,Machine learning ,Leukemia ,Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,RC254-282 - Abstract
Abstract Leukemia is a form of cancer that affects the bone marrow and lymphatic system, and it requires complex treatment strategies that vary with each subtype. Due to the subtle morphological differences among these types, monitoring gene expressions is crucial for accurate classification. Manual or pathological testing can be time-consuming and expensive. Therefore, data-driven methods and machine learning algorithms offer an efficient alternative for leukemia classification. This study introduced a novel super learning model that leverages heterogeneous machine learning models to analyze gene expression data and classify leukemia cells. The proposed approach incorporates an entropy-based feature importance technique to identify the gene profiles most significant to the labeling process. The strength of this super learning model lies in its final super learner, Random Forest, which effectively classifies cross-validated data from the candidate learners. Validation on a gene expression monitoring dataset demonstrates that this model outperforms other state-of-the-art models in predictive accuracy. The study contributes to the knowledge regarding the use of advanced machine learning techniques to improve the accuracy and reliability of leukemia classification using gene expression data, addressing the challenges of traditional methods that rely on clinical features and morphological examination.
- Published
- 2024
- Full Text
- View/download PDF