1. The relation between prediction model performance measures and patient selection outcomes for proton therapy in head and neck cancer
- Author
-
Artuur M. Leeuwenberg, Johannes B. Reitsma, Lisa G.L.J. Van den Bosch, Jeroen Hoogland, Arjen van der Schaaf, Frank J.P. Hoebers, Oda B. Wijers, Johannes A. Langendijk, Karel G.M. Moons, Ewoud Schuit, Epidemiology and Data Science, Radiation Oncology, RS: GROW - R3 - Innovative Cancer Diagnostics & Therapy, Radiotherapie, Guided Treatment in Optimal Selected Cancer Patients (GUTS), Damage and Repair in Cancer Development and Cancer Treatment (DARE), and Basic and Translational Research and Imaging Methodology Development in Groningen (BRIDGE)
- Subjects
Oncology ,Radiology, Nuclear Medicine and imaging ,Individualized treatment decisions ,Hematology ,Normal tissue complication probability models ,Head and neck cancer ,Prediction performance measures - Abstract
BACKGROUND: Normal-tissue complication probability (NTCP) models predict complication risk in patients receiving radiotherapy, considering radiation dose to healthy tissues, and are used to select patients for proton therapy, based on their expected reduction in risk after proton therapy versus photon radiotherapy (ΔNTCP). Recommended model evaluation measures include area under the receiver operating characteristic curve (AUC), overall calibration (CITL), and calibration slope (CS), whose precise relation to patient selection is still unclear. We investigated how each measure relates to patient selection outcomes.METHODS: The model validation and consequent patient selection process was simulated within empirical head and neck cancer patient data. By manipulating performance measures independently via model perturbations, the relation between model performance and patient selection was studied.RESULTS: Small reductions in AUC (-0.02) yielded mean changes in ΔNTCP between 0.9-3.2%, and single-model patient selection differences between 2-19%. Deviations (-0.2 or +0.2) in CITL or CS yielded mean changes in ΔNTCP between 0.3-1.4%, and single-model patient selection differences between 1-10%.CONCLUSIONS: Each measure independently impacts ΔNTCP and patient selection and should thus be assessed in a representative sufficiently large external sample. Our suggested practical model selection approach is considering the model with the highest AUC, and recalibrating it if needed.
- Published
- 2023