1. Enhancing breast cancer screening with urinary biomarkers and Random Forest supervised classification: A comprehensive investigation.
- Author
-
Alladio E, Trapani F, Castellino L, Massano M, Di Corcia D, Salomone A, Berrino E, Ponzone R, Marchiò C, Sapino A, and Vincenti M
- Subjects
- Humans, Female, Middle Aged, Aged, Chromatography, High Pressure Liquid methods, Tandem Mass Spectrometry methods, Supervised Machine Learning, Gonadal Steroid Hormones urine, Algorithms, Discriminant Analysis, Machine Learning, Postmenopause urine, Least-Squares Analysis, Italy, Random Forest, Breast Neoplasms urine, Breast Neoplasms diagnosis, Biomarkers, Tumor urine, Early Detection of Cancer methods
- Abstract
Objectives: Urinary sex hormones are investigated as potential biomarkers for the early detection of breast cancer, aiming to evaluate their relevance and applicability, in combination with supervised machine-learning data analysis, toward the ultimate goal of extensive screening., Methods: Sex hormones were determined on urine samples collected from 250 post-menopausal women (65 healthy - 185 with breast cancer, recruited among the clinical patients of Candiolo Cancer Institute FPO-IRCCS (Torino, Italy). Two analytical procedures based on UHPLC-MS/HRMS were developed and comprehensively validated to quantify 20 free and conjugated sex hormones from urine samples. The quantitative data were processed by seven machine learning algorithms. The efficiency of the resulting models was compared., Results: Among the tested models aimed to relate urinary estrogen and androgen levels and the occurrence of breast cancer, Random Forest (RF) proved to underscore all the other supervised classification approaches, including Partial Least Squares - Discriminant Analysis (PLS-DA), in terms of effectiveness and robustness. The final optimized model built on only five biomarkers (testosterone-sulphate, alpha-estradiol, 4-methoxyestradiol, DHEA-sulphate, and epitestosterone-sulphate) achieved an approximate 98% diagnostic accuracy on replicated validation sets. To balance the less-represented population of healthy women, a Synthetic Minority Oversampling TEchnique (SMOTE) data oversampling approach was applied., Conclusions: By means of tunable hyperparameters optimization, the RF algorithm showed great potential for early breast cancer detection, as it provides clear biomarkers ranking and their relative efficiency, allowing to ground the final diagnostic model on a restricted selection five steroid biomarkers only, as desirable for noninvasive tests with wide screening purposes., Competing Interests: Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Marco Vincenti reports financial support was provided by CRT Foundation. Marco Vincenti reports financial support was provided by Italian Ministry of Education, Universities, and Research. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF