1. Automated Breast Density Assessment in MRI Using Deep Learning and Radiomics: Strategies for Reducing Inter‐Observer Variability.
- Author
-
Jing, Xueping, Wielema, Mirjam, Monroy‐Gonzalez, Andrea G., Stams, Thom R.G., Mahesh, Shekar V.K., Oudkerk, Matthijs, Sijens, Paul E., Dorrius, Monique D., and van Ooijen, Peter M.A.
- Subjects
DEEP learning ,RECEIVER operating characteristic curves ,ARTIFICIAL intelligence ,RADIOMICS ,MAGNETIC resonance imaging - Abstract
Background: Accurate breast density evaluation allows for more precise risk estimation but suffers from high inter‐observer variability. Purpose: To evaluate the feasibility of reducing inter‐observer variability of breast density assessment through artificial intelligence (AI) assisted interpretation. Study Type: Retrospective. Population: Six hundred and twenty‐one patients without breast prosthesis or reconstructions were randomly divided into training (N = 377), validation (N = 98), and independent test (N = 146) datasets. Field Strength/Sequence: 1.5 T and 3.0 T; T1‐weighted spectral attenuated inversion recovery. Assessment: Five radiologists independently assessed each scan in the independent test set to establish the inter‐observer variability baseline and to reach a reference standard. Deep learning and three radiomics models were developed for three classification tasks: (i) four Breast Imaging‐Reporting and Data System (BI‐RADS) breast composition categories (A–D), (ii) dense (categories C, D) vs. non‐dense (categories A, B), and (iii) extremely dense (category D) vs. moderately dense (categories A–C). The models were tested against the reference standard on the independent test set. AI‐assisted interpretation was performed by majority voting between the models and each radiologist's assessment. Statistical Tests: Inter‐observer variability was assessed using linear‐weighted kappa (κ) statistics. Kappa statistics, accuracy, and area under the receiver operating characteristic curve (AUC) were used to assess models against reference standard. Results: In the independent test set, five readers showed an overall substantial agreement on tasks (i) and (ii), but moderate agreement for task (iii). The best‐performing model showed substantial agreement with reference standard for tasks (i) and (ii), but moderate agreement for task (iii). With the assistance of the AI models, almost perfect inter‐observer variability was obtained for tasks (i) (mean κ = 0.86), (ii) (mean κ = 0.94), and (iii) (mean κ = 0.94). Data Conclusion: Deep learning and radiomics models have the potential to help reduce inter‐observer variability of breast density assessment. Level of Evidence: 3 Technical Efficacy: Stage 1 [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF