Back to Search
Start Over
Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival
- Source :
- Scientific Reports, Vol 11, Iss 1, Pp 1-13 (2021), Scientific reports, 11(1):6968. Nature Publishing Group, Scientific Reports
- Publication Year :
- 2021
- Publisher :
- Nature Portfolio, 2021.
-
Abstract
- Cox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.
- Subjects :
- 0301 basic medicine
Support Vector Machine
Science
Breast Neoplasms
Machine learning
computer.software_genre
Risk Assessment
Article
Machine Learning
03 medical and health sciences
0302 clinical medicine
Breast cancer
medicine
Humans
Registries
Extreme gradient boosting
Survival analysis
Mathematics
Netherlands
Multidisciplinary
business.industry
Proportional hazards model
Random survival forests
Scientific data
medicine.disease
Prognosis
Computer science
Regression
Support vector machine
Survival Rate
030104 developmental biology
030220 oncology & carcinogenesis
Medicine
Female
Artificial intelligence
Explicit knowledge
business
computer
Subjects
Details
- Language :
- English
- ISSN :
- 20452322
- Volume :
- 11
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Scientific Reports
- Accession number :
- edsair.doi.dedup.....469b1f782d545c00abbf901d391cfb31