1. Explainable Machine Learning Models Using Robust Cancer Biomarkers Identification from Paired Differential Gene Expression.
- Author
-
Díaz de la Guardia-Bolívar, Elisa, Martínez Manjón, Juan Emilio, Pérez-Filgueiras, David, Zwir, Igor, and del Val, Coral
- Abstract
In oncology, there is a critical need for robust biomarkers that can be easily translated into the clinic. We introduce a novel approach using paired differential gene expression analysis for biological feature selection in machine learning models, enhancing robustness and interpretability while accounting for patient variability. This method compares primary tumor tissue with the same patient's healthy tissue, improving gene selection by eliminating individual-specific artifacts. A focus on carcinoma was selected due to its prevalence and the availability of the data; we aim to identify biomarkers involved in general carcinoma progression, including less-researched types. Our findings identified 27 pivotal genes that can distinguish between healthy and carcinoma tissue, even in unseen carcinoma types. Additionally, the panel could precisely identify the tissue-of-origin in the eight carcinoma types used in the discovery phase. Notably, in a proof of concept, the model accurately identified the primary tissue origin in metastatic samples despite limited sample availability. Functional annotation reveals these genes' involvement in cancer hallmarks, detecting subtle variations across carcinoma types. We propose paired differential gene expression analysis as a reference method for the discovering of robust biomarkers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF