1. Diagnostic performance of augmented intelligence with 2D and 3D total body photography and convolutional neural networks in a high-risk population for melanoma under real-world conditions: A new era of skin cancer screening?
- Author
-
Cerminara, Sara E., Cheng, Phil, Kostner, Lisa, Huber, Stephanie, Kunz, Michael, Maul, Julia-Tatjana, Böhm, Jette S., Dettwiler, Chiara F., Geser, Anna, Jakopović, Cécile, Stoffel, Livia M., Peter, Jelissa K., Levesque, Mitchell, Navarini, Alexander A., and Maul, Lara Valeska
- Subjects
- *
MELANOMA diagnosis , *CONFIDENCE intervals , *EARLY detection of cancer , *PHOTOGRAPHY , *DESCRIPTIVE statistics , *ARTIFICIAL neural networks , *SENSITIVITY & specificity (Statistics) , *RECEIVER operating characteristic curves - Abstract
Convolutional neural networks (CNNs) have outperformed dermatologists in classifying pigmented skin lesions under artificial conditions. We investigated, for the first time, the performance of three-dimensional (3D) and two-dimensional (2D) CNNs and dermatologists in the early detection of melanoma in a real-world setting. In this prospective study, 1690 melanocytic lesions in 143 patients with high-risk criteria for melanoma were evaluated by dermatologists, 2D-FotoFinder-ATBM and 3D-Vectra WB360 total body photography (TBP). Excision was based on the dermatologists' dichotomous decision, an elevated CNN risk score (study-specific malignancy cut-off: FotoFinder >0.5, Vectra >5.0) and/or the second dermatologist's assessment with CNN support. The diagnostic accuracy of the 2D and 3D CNN classification was compared with that of the dermatologists and the augmented intelligence based on histopathology and dermatologists' assessment. Secondary end-points included reproducibility of risk scores and naevus counts per patient by medical staff (gold standard) compared to automated 3D and 2D TBP CNN counts. The sensitivity, specificity, and receiver operating characteristics area under the curve (ROC-AUC) for risk-score-assessments compared to histopathology of 3D-CNN with 95% confidence intervals (CI) were 90.0%, 64.6% and 0.92 (CI 0.85–1.00), respectively. While dermatologists and augmented intelligence achieved the same sensitivity (90%) and comparable classification ROC-AUC (0.91 [CI 0.80–1.00], 0.88 [CI 0.77–1.00]) with 3D-CNN, their specificity was superior (92.3% and 86.2%, respectively). The 2D-CNN (sensitivity: 70%, specificity: 40%, ROC-AUC: 0.68 [CI 0.46–0.90]) was outperformed by 3D CNN and dermatologists. The 3D-CNN showed a higher correlation coefficient for repeated measurements of 246 lesions (R = 0.89) than the 2D-CNN (R = 0.79). The mean naevus count per patient varied significantly (gold standard: 210 lesions; 3D-CNN: 469; 2D-CNN: 1324; p < 0.0001). Our study emphasises the importance of validating the classification of CNNs in real life. The novel 3D-CNN device outperformed the 2D-CNN and achieved comparable sensitivity with dermatologists. The low specificity of CNNs and the lack of automated counting of TBP nevi currently limit the use of augmented intelligence in clinical practice. • Real-world comparison of dermatologists and AI in melanoma detection. • Sensitivity of 3D CNN versus dermatologists comparable in classification. • 2D CNN was outperformed in classifying melanocytic lesions by 3D CNN/dermatologists. • Low specificity by augmented intelligence in real-life setting. • CNN nevi counting still lacking utility for clinical practice due to low correlation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF