Back to Search Start Over

Fully Automatic Deep Learning in Bi-institutional Prostate Magnetic Resonance Imaging: Effects of Cohort Size and Heterogeneity

Authors :
Thomas Hielscher
Regula Gnirs
Xianfeng Wang
Patrick Schelb
Heinz Peter Schlemmer
Constantin Schwab
Xiaoyan Qin
Cedric Weißer
Albrecht Stenzinger
Nils Netzer
Klaus H. Maier-Hein
Markus Hohenfellner
Tristan Anselm Kuder
Magdalena Görtz
David Bonekamp
Jan Philipp Radtke
Viktoria Schütz
Source :
Investigative radiology. 56(12)
Publication Year :
2021

Abstract

BACKGROUND The potential of deep learning to support radiologist prostate magnetic resonance imaging (MRI) interpretation has been demonstrated. PURPOSE The aim of this study was to evaluate the effects of increased and diversified training data (TD) on deep learning performance for detection and segmentation of clinically significant prostate cancer-suspicious lesions. MATERIALS AND METHODS In this retrospective study, biparametric (T2-weighted and diffusion-weighted) prostate MRI acquired with multiple 1.5-T and 3.0-T MRI scanners in consecutive men was used for training and testing of prostate segmentation and lesion detection networks. Ground truth was the combination of targeted and extended systematic MRI-transrectal ultrasound fusion biopsies, with significant prostate cancer defined as International Society of Urological Pathology grade group greater than or equal to 2. U-Nets were internally validated on full, reduced, and PROSTATEx-enhanced training sets and subsequently externally validated on the institutional test set and the PROSTATEx test set. U-Net segmentation was calibrated to clinically desired levels in cross-validation, and test performance was subsequently compared using sensitivities, specificities, predictive values, and Dice coefficient. RESULTS One thousand four hundred eighty-eight institutional examinations (median age, 64 years; interquartile range, 58-70 years) were temporally split into training (2014-2017, 806 examinations, supplemented by 204 PROSTATEx examinations) and test (2018-2020, 682 examinations) sets. In the test set, Prostate Imaging-Reporting and Data System (PI-RADS) cutoffs greater than or equal to 3 and greater than or equal to 4 on a per-patient basis had sensitivity of 97% (241/249) and 90% (223/249) at specificity of 19% (82/433) and 56% (242/433), respectively. The full U-Net had corresponding sensitivity of 97% (241/249) and 88% (219/249) with specificity of 20% (86/433) and 59% (254/433), not statistically different from PI-RADS (P > 0.3 for all comparisons). U-Net trained using a reduced set of 171 consecutive examinations achieved inferior performance (P < 0.001). PROSTATEx training enhancement did not improve performance. Dice coefficients were 0.90 for prostate and 0.42/0.53 for MRI lesion segmentation at PI-RADS category 3/4 equivalents. CONCLUSIONS In a large institutional test set, U-Net confirms similar performance to clinical PI-RADS assessment and benefits from more TD, with neither institutional nor PROSTATEx performance improved by adding multiscanner or bi-institutional TD.

Details

ISSN :
15360210
Volume :
56
Issue :
12
Database :
OpenAIRE
Journal :
Investigative radiology
Accession number :
edsair.doi.dedup.....f7165be25201a4d3062917eaaff03701