102 results on '"Mazurowski, MA"'
Search Results
2. Breast tumor segmentation in DCE-MRI using fully convolutional networks with an application in radiogenomics
- Author
-
Petrick, N, Mori, K, Zhang, Jun, Saha, A, Zhu, Z, Mazurowski, MA, Petrick, N, Mori, K, Zhang, Jun, Saha, A, Zhu, Z, and Mazurowski, MA
- Published
- 2018
3. Deep learning-based features of breast MRI for prediction of occult invasive disease following a diagnosis of ductal carcinoma in situ: Preliminary data
- Author
-
Petrick, N, Mori, K, Zhu, Z, Harowicz, M, Zhang, Jun, Saha, A, Grimm, LJ, Hwang, S, Mazurowski, MA, Petrick, N, Mori, K, Zhu, Z, Harowicz, M, Zhang, Jun, Saha, A, Grimm, LJ, Hwang, S, and Mazurowski, MA
- Published
- 2018
4. Convolutional encoder-decoder for breast mass segmentation in digital breast tomosynthesis
- Author
-
Zhang, Jun, Ghate, SV, Grimm, LJ, Saha, A, Cain, EH, Zhu, Z, Mazurowski, MA, Zhang, Jun, Ghate, SV, Grimm, LJ, Saha, A, Cain, EH, Zhu, Z, and Mazurowski, MA
- Abstract
Digital breast tomosynthesis (DBT) is a relatively new modality for breast imaging that can provide detailed assessment of dense tissue within the breast. In the domains of cancer diagnosis, radiogenomics, and resident education, it is important to accurately segment breast masses. However, breast mass segmentation is a very challenging task, since mass regions have low contrast difference between their neighboring tissues. Notably, the task might become more difficult in cases that were assigned BI-RADS 0 category since this category includes many lesions that are of low conspicuity and locations that were deemed to be overlapping normal tissue upon further imaging and were not sent to biopsy. Segmentation of such lesions is of particular importance in the domain of reader performance analysis and education. In this paper, we propose a novel deep learning-based method for segmentation of BI-RADS 0 lesions in DBT. The key components of our framework are an encoding path for local-to-global feature extraction, and a decoding patch to expand the images. To address the issue of limited training data, in the training stage, we propose to sample patches not only in mass regions but also in non-mass regions. We utilize a Dice-like loss function in the proposed network to alleviate the class-imbalance problem. The preliminary results on 40 subjects show promise of our method. In addition to quantitative evaluation of the method, we present a visualization of the results that demonstrate both the performance of the algorithm as well as the difficulty of the task at hand.
- Published
- 2018
5. Breast mass detection in mammography and tomosynthesis via fully convolutional network-based heatmap regression
- Author
-
Petrick, N, Mori, K, Zhang, Jun, Cain, EH, Saha, A, Zhu, Z, Mazurowski, MA, Petrick, N, Mori, K, Zhang, Jun, Cain, EH, Saha, A, Zhu, Z, and Mazurowski, MA
- Published
- 2018
6. Breast cancer molecular subtype classification using deep features: Preliminary results
- Author
-
Petrick, N, Mori, K, Zhu, Z, Albadawy, E, Saha, A, Zhang, Jun, Harowicz, MR, Mazurowski, MA, Petrick, N, Mori, K, Zhu, Z, Albadawy, E, Saha, A, Zhang, Jun, Harowicz, MR, and Mazurowski, MA
- Published
- 2018
7. Identifying error-making patterns in assessment of mammographic BI-RADS descriptors among radiology residents using statistical pattern recognition.
- Author
-
Mazurowski MA, Barnhart HX, Baker JA, Tourassi GD, Mazurowski, Maciej A, Barnhart, Huiman X, Baker, Jay A, and Tourassi, Georgia D
- Abstract
Rationale and Objective: The objective of this study is to test the hypothesis that there are patterns in erroneous assessment of BI-RADS features among radiology trainees when interpreting mammographic masses and that these patterns can be captured in individualized statistical user models. Identifying these patterns could be useful in personalizing and adapting educational material to complement the individual weaknesses of each trainee during his or her mammography education.Materials and Methods: Reading data of 33 mammographic cases containing masses was used. The cases were individually described by 10 radiology residents using four BI-RADS features: mass shape, mass margin, mass density and parenchyma density. For each resident, an individual model was automatically constructed that predicts likelihood (HIGH or LOW) of erroneously assigning each BI-RADS descriptor by the resident. Error was defined as deviation of the resident's assessment from the expert assessments. We evaluated the predictive performance of the models using leave-one-out crossvalidation.Results: The user models were able to predict which assessments have higher likelihood of error. The proportion of actual errors to the number of situations in which these errors could potentially occur was significantly higher (P < .05) when user-model assigned HIGH likelihood of error than when LOW likelihood of error was assigned for three of the four BI-RADS features. Overall, the difference between the HIGH and LOW likelihood of error groups was statistically significant (P < .0001) combining all four features.Conclusion: Error making in BI-RADS descriptor assessment appears to follow patterns that can be captured with statistical pattern recognition-based user models. [ABSTRACT FROM AUTHOR]- Published
- 2012
- Full Text
- View/download PDF
8. Development and Evaluation of Automated Artificial Intelligence-Based Brain Tumor Response Assessment in Patients with Glioblastoma.
- Author
-
Zhang J, LaBella D, Zhang D, Houk JL, Rudie JD, Zou H, Warman P, Mazurowski MA, and Calabrese E
- Abstract
Background and Purpose: To develop and evaluate an automated, AI-based, volumetric brain tumor MRI response assessment algorithm on a large cohort of patients treated at a high-volume brain tumor center., Materials and Methods: We retrospectively analyzed data from 634 patients treated for glioblastoma at a single brain tumor center over a 5-year period (2017-2021). The mean age was 56 +/-13 years. 372/634 (59%) patients were male, and 262/634 (41%) patients were female. Study data consisted of 3,403 brain MRI exams and corresponding standardized, radiologist-based brain tumor response assessments (BT-RADS). An artificial intelligence (AI)-based brain tumor response assessment algorithm was developed using automated, volumetric tumor segmentation. AI-based response assessments were evaluated for agreement with radiologist-based response assessments and ability to stratify patients by overall survival. Metrics were computed to assess the agreement using BTRADS as the ground-truth, fixed-time point survival analysis was conducted to evaluate the survival stratification, and associated P-values were calculated., Results: For all BT-RADS categories, AI-based response assessments showed moderate agreement with radiologists' response assessments (F1 = 0.587-0.755). Kaplan-Meier survival analysis revealed statistically worse overall fixed time point survival for patients assessed as image worsening equivalent to RANO progression by human alone compared to by AI alone (log-rank P=0.007). Cox proportional hazard model analysis showed a disadvantage to AI-based assessments for overall survival prediction (P=0.012)., Conclusions: AI-based volumetric glioblastoma MRI response assessment following BT-RADS criteria yielded moderate agreement for replicating human response assessments and slightly worse stratification by overall survival., Abbreviations: GBM= Glioblastoma; RANO= Response Assessment in Neuro-Oncology; BTRADS= Brain Tumor Reporting and Data System; NLP = Natural Language Processing., Competing Interests: We declare no conflict of interest., (© 2024 by American Journal of Neuroradiology.)
- Published
- 2024
- Full Text
- View/download PDF
9. Simplifying risk stratification for thyroid nodules on ultrasound: validation and performance of an artificial intelligence thyroid imaging reporting and data system.
- Author
-
Wildman-Tobriner B, Yang J, Allen BC, Ho LM, Miller CM, and Mazurowski MA
- Subjects
- Humans, Retrospective Studies, Female, Male, Middle Aged, Risk Assessment, Adult, Biopsy, Fine-Needle, Aged, Radiology Information Systems, Data Systems, Aged, 80 and over, Thyroid Nodule diagnostic imaging, Thyroid Nodule pathology, Ultrasonography methods, Artificial Intelligence, Sensitivity and Specificity
- Abstract
Purpose: To validate the performance of a recently created risk stratification system (RSS) for thyroid nodules on ultrasound, the Artificial Intelligence Thyroid Imaging Reporting and Data System (AI TI-RADS)., Materials and Methods: 378 thyroid nodules from 320 patients were included in this retrospective evaluation. All nodules had ultrasound images and had undergone fine needle aspiration (FNA). 147 nodules were Bethesda V or VI (suspicious or diagnostic for malignancy), and 231 were Bethesda II (benign). Three radiologists assigned features according to the AI TI-RADS lexicon (same categories and features as the American College of Radiology TI-RADS) to each nodule based on ultrasound images. FNA recommendations using AI TI-RADS and ACR TI-RADS were then compared and sensitivity and specificity for each RSS were calculated., Results: Across three readers, mean sensitivity of AI TI-RADS was lower than ACR TI-RADS (0.69 vs 0.72, p < 0.02), while mean specificity was higher (0.40 vs 0.37, p < 0.02). Overall total number of points assigned by all three readers decreased slightly when using AI TI-RADS (5,998 for AI TI-RADS vs 6,015 for ACR TI-RADS), including more values of 0 to several features., Conclusion: AI TI-RADS performed similarly to ACR TI-RADS while eliminating point assignments for many features, allowing for simplification of future TI-RADS versions., Competing Interests: Declaration of competing interest None of the authors have any conflicts of interest to disclose., (Copyright © 2024 Elsevier Inc. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
10. Automated selection of abdominal MRI series using a DICOM metadata classifier and selective use of a pixel-based classifier.
- Author
-
Miller CM, Zhu Z, Mazurowski MA, Bashir MR, and Wiggins WF
- Subjects
- Humans, Image Interpretation, Computer-Assisted methods, Magnetic Resonance Imaging methods, Abdomen diagnostic imaging, Metadata, Radiology Information Systems
- Abstract
Accurate, automated MRI series identification is important for many applications, including display ("hanging") protocols, machine learning, and radiomics. The use of the series description or a pixel-based classifier each has limitations. We demonstrate a combined approach utilizing a DICOM metadata-based classifier and selective use of a pixel-based classifier to identify abdominal MRI series. The metadata classifier was assessed alone as Group metadata and combined with selective use of the pixel-based classifier for predictions with less than 70% certainty (Group combined). The overall accuracy (mean and 95% confidence intervals) for Groups metadata and combined on the test dataset were 0.870 CI (0.824,0.912) and 0.930 CI (0.893,0.963), respectively. With this combined metadata and pixel-based approach, we demonstrate accurate classification of 95% or greater for all pre-contrast MRI series and improved performance for some post-contrast series., (© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.)
- Published
- 2024
- Full Text
- View/download PDF
11. Pilot study of machine learning for detection of placenta accreta spectrum.
- Author
-
Zhang Y, Ellestad SC, Gilner JB, Pyne A, Boyd BK, Mazurowski MA, and Gatta LA
- Subjects
- Humans, Female, Pregnancy, Pilot Projects, Ultrasonography, Prenatal, Adult, Placenta Accreta diagnostic imaging, Machine Learning
- Published
- 2024
- Full Text
- View/download PDF
12. A publicly available deep learning model and dataset for segmentation of breast, fibroglandular tissue, and vessels in breast MRI.
- Author
-
Lew CO, Harouni M, Kirksey ER, Kang EJ, Dong H, Gu H, Grimm LJ, Walsh R, Lowell DA, and Mazurowski MA
- Subjects
- Humans, Female, Magnetic Resonance Imaging, Radiography, Breast Density, Deep Learning, Breast Neoplasms diagnostic imaging
- Abstract
Breast density, or the amount of fibroglandular tissue (FGT) relative to the overall breast volume, increases the risk of developing breast cancer. Although previous studies have utilized deep learning to assess breast density, the limited public availability of data and quantitative tools hinders the development of better assessment tools. Our objective was to (1) create and share a large dataset of pixel-wise annotations according to well-defined criteria, and (2) develop, evaluate, and share an automated segmentation method for breast, FGT, and blood vessels using convolutional neural networks. We used the Duke Breast Cancer MRI dataset to randomly select 100 MRI studies and manually annotated the breast, FGT, and blood vessels for each study. Model performance was evaluated using the Dice similarity coefficient (DSC). The model achieved DSC values of 0.92 for breast, 0.86 for FGT, and 0.65 for blood vessels on the test set. The correlation between our model's predicted breast density and the manually generated masks was 0.95. The correlation between the predicted breast density and qualitative radiologist assessment was 0.75. Our automated models can accurately segment breast, FGT, and blood vessels using pre-contrast breast MRI data. The data and the models were made publicly available., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
13. Computed Tomography Volumetrics for Size Matching in Lung Transplantation for Restrictive Disease.
- Author
-
Prabhu NK, Wong MK, Klapper JA, Haney JC, Mazurowski MA, Mammarappallil JG, and Hartwig MG
- Subjects
- Adult, Humans, Lung surgery, Retrospective Studies, Organ Size, Tissue Donors, Tomography, X-Ray Computed, Primary Graft Dysfunction etiology, Lung Transplantation methods, Lung Diseases surgery
- Abstract
Background: There is no consensus on the optimal allograft sizing strategy for lung transplantation in restrictive lung disease. Current methods that are based on predicted total lung capacity (pTLC) ratios do not account for the diminutive recipient chest size. The study investigators hypothesized that a new sizing ratio incorporating preoperative recipient computed tomographic lung volumes (CTVol) would be associated with postoperative outcomes., Methods: A retrospective single-institution study was conducted of adults undergoing primary bilateral lung transplantation between January 2016 and July 2020 for restrictive lung disease. CTVol was computed for recipients by using advanced segmentation software. Two sizing ratios were calculated: pTLC ratio (pTLC
donor /pTLCrecipient ) and a new volumetric ratio (pTLCdonor /CTVolrecipient ). Patients were divided into reference, oversized, and undersized groups on the basis of ratio quintiles, and multivariable models were used to assess the effect of the ratios on primary graft dysfunction and survival., Results: CTVol was successfully acquired in 218 of 220 (99.1%) patients. In adjusted analysis, undersizing on the basis of the volumetric ratio was independently associated with decreased primary graft dysfunction grade 2 or 3 within 72 hours (odds ratio, 0.42; 95% CI, 0.20-0.87; P =.02). The pTLC ratio was not significantly associated with primary graft dysfunction. Oversizing on the basis of the volumetric ratio was independently associated with an increased risk of death (hazard ratio, 2.27; 95% CI, 1.04-4.99; P =.04], whereas the pTLC ratio did not have a significant survival association., Conclusions: Using computed tomography-acquired lung volumes for donor-recipient size matching in lung transplantation is feasible with advanced segmentation software. This method may be more predictive of outcome compared with current sizing methods, which use gender and height only., (Copyright © 2024 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.)- Published
- 2024
- Full Text
- View/download PDF
14. SWSSL: Sliding Window-Based Self-Supervised Learning for Anomaly Detection in High-Resolution Images.
- Author
-
Dong H, Zhang Y, Gu H, Konz N, Zhang Y, and Mazurowski MA
- Subjects
- Supervised Machine Learning, Algorithms, Neural Networks, Computer
- Abstract
Anomaly detection (AD) aims to determine if an instance has properties different from those seen in normal cases. The success of this technique depends on how well a neural network learns from normal instances. We observe that the learning difficulty scales exponentially with the input resolution, making it infeasible to apply AD to high-resolution images. Resizing them to a lower resolution is a compromising solution and does not align with clinical practice where the diagnosis could depend on image details. In this work, we propose to train the network and perform inference at the patch level, through the sliding window algorithm. This simple operation allows the network to receive high-resolution images but introduces additional training difficulties, including inconsistent image structure and higher variance. We address these concerns by setting the network's objective to learn augmentation-invariant features. We further study the augmentation function in the context of medical imaging. In particular, we observe that the resizing operation, a key augmentation in general computer vision literature, is detrimental to detection accuracy, and the inverting operation can be beneficial. We also propose a new module that encourages the network to learn from adjacent patches to boost detection performance. Extensive experiments are conducted on breast tomosynthesis and chest X-ray datasets and our method improves 8.03% and 5.66% AUC on image-level classification respectively over the current leading techniques. The experimental results demonstrate the effectiveness of our approach.
- Published
- 2023
- Full Text
- View/download PDF
15. Improving Image Classification of Knee Radiographs: An Automated Image Labeling Approach.
- Author
-
Zhang J, Santos C, Park C, Mazurowski MA, and Colglazier R
- Subjects
- Humans, Radiography, Arthroplasty, Knee Joint diagnostic imaging, Radiology
- Abstract
Large numbers of radiographic images are available in musculoskeletal radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification model to distinguish normal knee images from those with abnormalities or prior arthroplasty. The automated labeler was trained on a small set of labeled data to automatically label a much larger set of unlabeled data, further improving the image classification performance for knee radiographic diagnosis. We used BioBERT and EfficientNet as the feature extraction backbone of the labeler and imaging model, respectively. We developed our approach using 7382 patients and validated it on a separate set of 637 patients. The final image classification model, trained using both manually labeled and pseudo-labeled data, had the higher weighted average AUC (WA-AUC 0.903) value and higher AUC values among all classes (normal AUC 0.894; abnormal AUC 0.896, arthroplasty AUC 0.990) compared to the baseline model (WA-AUC = 0.857; normal AUC 0.842; abnormal AUC 0.848, arthroplasty AUC 0.987), trained using only manually labeled data. Statistical tests show that the improvement is significant on normal (p value < 0.002), abnormal (p value < 0.001), and WA-AUC (p value = 0.001). Our findings demonstrated that the proposed automated labeling approach significantly improves the performance of image classification for radiographic knee diagnosis, allowing for facilitating patient care and curation of large knee datasets., (© 2023. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.)
- Published
- 2023
- Full Text
- View/download PDF
16. MRI-based Deep Learning Assessment of Amyloid, Tau, and Neurodegeneration Biomarker Status across the Alzheimer Disease Spectrum.
- Author
-
Lew CO, Zhou L, Mazurowski MA, Doraiswamy PM, and Petrella JR
- Subjects
- Aged, Humans, Male, Amyloid, Amyloid beta-Peptides, Apolipoproteins E, Biomarkers, Magnetic Resonance Imaging methods, Positron-Emission Tomography methods, Retrospective Studies, tau Proteins, Female, Alzheimer Disease diagnostic imaging, Cognitive Dysfunction, Deep Learning
- Abstract
Background PET can be used for amyloid-tau-neurodegeneration (ATN) classification in Alzheimer disease, but incurs considerable cost and exposure to ionizing radiation. MRI currently has limited use in characterizing ATN status. Deep learning techniques can detect complex patterns in MRI data and have potential for noninvasive characterization of ATN status. Purpose To use deep learning to predict PET-determined ATN biomarker status using MRI and readily available diagnostic data. Materials and Methods MRI and PET data were retrospectively collected from the Alzheimer's Disease Imaging Initiative. PET scans were paired with MRI scans acquired within 30 days, from August 2005 to September 2020. Pairs were randomly split into subsets as follows: 70% for training, 10% for validation, and 20% for final testing. A bimodal Gaussian mixture model was used to threshold PET scans into positive and negative labels. MRI data were fed into a convolutional neural network to generate imaging features. These features were combined in a logistic regression model with patient demographics, APOE gene status, cognitive scores, hippocampal volumes, and clinical diagnoses to classify each ATN biomarker component as positive or negative. Area under the receiver operating characteristic curve (AUC) analysis was used for model evaluation. Feature importance was derived from model coefficients and gradients. Results There were 2099 amyloid (mean patient age, 75 years ± 10 [SD]; 1110 male), 557 tau (mean patient age, 75 years ± 7; 280 male), and 2768 FDG PET (mean patient age, 75 years ± 7; 1645 male) and MRI pairs. Model AUCs for the test set were as follows: amyloid, 0.79 (95% CI: 0.74, 0.83); tau, 0.73 (95% CI: 0.58, 0.86); and neurodegeneration, 0.86 (95% CI: 0.83, 0.89). Within the networks, high gradients were present in key temporal, parietal, frontal, and occipital cortical regions. Model coefficients for cognitive scores, hippocampal volumes, and APOE status were highest. Conclusion A deep learning algorithm predicted each component of PET-determined ATN status with acceptable to excellent efficacy using MRI and other available diagnostic data. © RSNA, 2023 Supplemental material is available for this article.
- Published
- 2023
- Full Text
- View/download PDF
17. Segment anything model for medical image analysis: An experimental study.
- Author
-
Mazurowski MA, Dong H, Gu H, Yang J, Konz N, and Zhang Y
- Subjects
- Humans, S-Adenosylmethionine, Tomography, X-Ray Computed, Brain Neoplasms
- Abstract
Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to segment user-defined objects of interest in an interactive manner. While the model performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point and box prompts for SAM using a standard method that simulates interactive segmentation. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity such as the segmentation of organs in computed tomography and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it. Code for evaluation SAM is made publicly available at https://github.com/mazurowski-lab/segment-anything-medical-evaluation., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023 Elsevier B.V. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
18. Feasibility of predicting a screening digital breast tomosynthesis recall using features extracted from the electronic medical record.
- Author
-
Zhang J, Mazurowski MA, and Grimm LJ
- Subjects
- Female, Humans, Feasibility Studies, Mammography, Breast Density, Breast, Early Detection of Cancer, Mass Screening, Retrospective Studies, Electronic Health Records, Breast Neoplasms diagnostic imaging
- Abstract
Purpose: Tools to predict a screening mammogram recall at the time of scheduling could improve patient care. We extracted patient demographic and breast care history information within the electronic medical record (EMR) for women undergoing digital breast tomosynthesis (DBT) to identify which factors were associated with a screening recall recommendation., Method: In 2018, 21,543 women aged 40 years or greater who underwent screening DBT at our institution were identified. Demographic information and breast care factors were extracted automatically from the EMR. The primary outcome was a screening recall recommendation of BI-RADS 0. A multivariable logistic regression model was built and included age, race, ethnicity groups, family breast cancer history, personal breast cancer history, surgical breast cancer history, recall history, and days since last available screening mammogram., Results: Multiple factors were associated with a recall on the multivariable model: history of breast cancer surgery (OR: 2.298, 95% CI: 1.854, 2.836); prior recall within the last five years (vs no prior, OR: 0.768, 95% CI: 0.687, 0.858); prior screening mammogram within 0-18 months (vs no prior, OR: 0.601, 95% CI: 0.520, 0.691), prior screening mammogram within 18-30 months (vs no prior, OR: 0.676, 95% CI: 0.520, 0.691); and age (normalized OR: 0.723, 95% CI: 0.690, 0.758)., Conclusions: It is feasible to predict a DBT screening recall recommendation using patient demographics and breast care factors that can be extracted automatically from the EMR., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023 Elsevier B.V. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
19. Duke Liver Dataset: A Publicly Available Liver MRI Dataset with Liver Segmentation Masks and Series Labels.
- Author
-
Macdonald JA, Zhu Z, Konkel B, Mazurowski MA, Wiggins WF, and Bashir MR
- Abstract
The Duke Liver Dataset contains 2146 abdominal MRI series from 105 patients, including a majority with cirrhotic features, and 310 image series with corresponding manually segmented liver masks., Competing Interests: Disclosures of conflicts of interest: J.A.M. No relevant relationships. Z.Z. No relevant relationships. B.K. No relevant relationships. M.A.M. No relevant relationships. W.F.W. Research funding to institution from The Marcus Foundation and the National Institutes of Health (grant no. R01-NS123275); consulting fees from Qure.ai to author; honorarium paid to author from University of Wisconsin–GE CT Protocols Partnership Medical Advisory Board. M.R.B. Grants or contracts from Siemens Healthineers, Madrigal Pharmaceuticals, NGM Biopharmaceuticals, Metacrine, Corcept, and Carmot Therapeutics (author was primary investigator on research grants to institution); associate editor for Radiology., (© 2023 by the Radiological Society of North America, Inc.)
- Published
- 2023
- Full Text
- View/download PDF
20. Deep learning for classification of thyroid nodules on ultrasound: validation on an independent dataset.
- Author
-
Weng J, Wildman-Tobriner B, Buda M, Yang J, Ho LM, Allen BC, Ehieli WL, Miller CM, Zhang J, and Mazurowski MA
- Subjects
- Humans, Retrospective Studies, Ultrasonography methods, Neural Networks, Computer, Thyroid Nodule diagnostic imaging, Thyroid Nodule pathology, Deep Learning
- Abstract
Objectives: The purpose is to apply a previously validated deep learning algorithm to a new thyroid nodule ultrasound image dataset and compare its performances with radiologists., Methods: Prior study presented an algorithm which is able to detect thyroid nodules and then make malignancy classifications with two ultrasound images. A multi-task deep convolutional neural network was trained from 1278 nodules and originally tested with 99 separate nodules. The results were comparable with that of radiologists. The algorithm was further tested with 378 nodules imaged with ultrasound machines from different manufacturers and product types than the training cases. Four experienced radiologists were requested to evaluate the nodules for comparison with deep learning., Results: The Area Under Curve (AUC) of the deep learning algorithm and four radiologists were calculated with parametric, binormal estimation. For the deep learning algorithm, the AUC was 0.69 (95% CI: 0.64-0.75). The AUC of radiologists were 0.63 (95% CI: 0.59-0.67), 0.66 (95% CI:0.61-0.71), 0.65 (95% CI: 0.60-0.70), and 0.63 (95%CI: 0.58-0.67)., Conclusion: In the new testing dataset, the deep learning algorithm achieved similar performances with all four radiologists. The relative performance difference between the algorithm and the radiologists is not significantly affected by the difference of ultrasound scanner., Competing Interests: Declaration of competing interest The authors have no conflicts of interest to declare., (Copyright © 2023 Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
21. Multistep Automated Data Labelling Procedure (MADLaP) for thyroid nodules on ultrasound: An artificial intelligence approach for automating image annotation.
- Author
-
Zhang J, Mazurowski MA, Allen BC, and Wildman-Tobriner B
- Subjects
- Humans, Artificial Intelligence, Data Curation, Ultrasonography methods, Neural Networks, Computer, Thyroid Nodule diagnostic imaging, Thyroid Nodule pathology
- Abstract
Machine learning (ML) for diagnosis of thyroid nodules on ultrasound is an active area of research. However, ML tools require large, well-labeled datasets, the curation of which is time-consuming and labor-intensive. The purpose of our study was to develop and test a deep-learning-based tool to facilitate and automate the data annotation process for thyroid nodules; we named our tool Multistep Automated Data Labelling Procedure (MADLaP). MADLaP was designed to take multiple inputs including pathology reports, ultrasound images, and radiology reports. Using multiple step-wise 'modules' including rule-based natural language processing, deep-learning-based imaging segmentation, and optical character recognition, MADLaP automatically identified images of a specific thyroid nodule and correctly assigned a pathology label. The model was developed using a training set of 378 patients across our health system and tested on a separate set of 93 patients. Ground truths for both sets were selected by an experienced radiologist. Performance metrics including yield (how many labeled images the model produced) and accuracy (percentage correct) were measured using the test set. MADLaP achieved a yield of 63 % and an accuracy of 83 %. The yield progressively increased as the input data moved through each module, while accuracy peaked part way through. Error analysis showed that inputs from certain examination sites had lower accuracy (40 %) than the other sites (90 %, 100 %). MADLaP successfully created curated datasets of labeled ultrasound images of thyroid nodules. While accurate, the relatively suboptimal yield of MADLaP exposed some challenges when trying to automatically label radiology images from heterogeneous sources. The complex task of image curation and annotation could be automated, allowing for enrichment of larger datasets for use in machine learning development., Competing Interests: Declaration of competing interest None., (Copyright © 2023 Elsevier B.V. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
22. Unsupervised anomaly localization in high-resolution breast scans using deep pluralistic image completion.
- Author
-
Konz N, Dong H, and Mazurowski MA
- Subjects
- Humans, Female, Machine Learning, Mammography methods, Breast Neoplasms diagnostic imaging
- Abstract
Automated tumor detection in Digital Breast Tomosynthesis (DBT) is a difficult task due to natural tumor rarity, breast tissue variability, and high resolution. Given the scarcity of abnormal images and the abundance of normal images for this problem, an anomaly detection/localization approach could be well-suited. However, most anomaly localization research in machine learning focuses on non-medical datasets, and we find that these methods fall short when adapted to medical imaging datasets. The problem is alleviated when we solve the task from the image completion perspective, in which the presence of anomalies can be indicated by a discrepancy between the original appearance and its auto-completion conditioned on the surroundings. However, there are often many valid normal completions given the same surroundings, especially in the DBT dataset, making this evaluation criterion less precise. To address such an issue, we consider pluralistic image completion by exploring the distribution of possible completions instead of generating fixed predictions. This is achieved through our novel application of spatial dropout on the completion network during inference time only, which requires no additional training cost and is effective at generating diverse completions. We further propose minimum completion distance (MCD), a new metric for detecting anomalies, thanks to these stochastic completions. We provide theoretical as well as empirical support for the superiority over existing methods of using the proposed method for anomaly localization. On the DBT dataset, our model outperforms other state-of-the-art methods by at least 10% AUROC for pixel-level detection., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023 Elsevier B.V. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
23. Deep Learning for Breast MRI Style Transfer with Limited Training Data.
- Author
-
Cao S, Konz N, Duncan J, and Mazurowski MA
- Subjects
- Humans, Magnetic Resonance Imaging, Radiography, Image Processing, Computer-Assisted methods, Deep Learning
- Abstract
In this work we introduce a novel medical image style transfer method, StyleMapper, that can transfer medical scans to an unseen style with access to limited training data. This is made possible by training our model on unlimited possibilities of simulated random medical imaging styles on the training set, making our work more computationally efficient when compared with other style transfer methods. Moreover, our method enables arbitrary style transfer: transferring images to styles unseen in training. This is useful for medical imaging, where images are acquired using different protocols and different scanner models, resulting in a variety of styles that data may need to be transferred between. Our model disentangles image content from style and can modify an image's style by simply replacing the style encoding with one extracted from a single image of the target style, with no additional optimization required. This also allows the model to distinguish between different styles of images, including among those that were unseen in training. We propose a formal description of the proposed model. Experimental results on breast magnetic resonance images indicate the effectiveness of our method for style transfer. Our style transfer method allows for the alignment of medical images taken with different scanners into a single unified style dataset, allowing for the training of other downstream tasks on such a dataset for tasks such as classification, object detection and others., (© 2022. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.)
- Published
- 2023
- Full Text
- View/download PDF
24. Thyroid Nodules on Ultrasound in Children and Young Adults: Comparison of Diagnostic Performance of Radiologists' Impressions, ACR TI-RADS, and a Deep Learning Algorithm.
- Author
-
Yang J, Page LC, Wagner L, Wildman-Tobriner B, Bisset L, Frush D, and Mazurowski MA
- Subjects
- Humans, Male, Child, Female, Young Adult, Adolescent, Adult, Retrospective Studies, Ultrasonography methods, Radiologists, Thyroid Nodule pathology, Deep Learning
- Abstract
BACKGROUND. In current clinical practice, thyroid nodules in children are generally evaluated on the basis of radiologists' overall impressions of ultrasound images. OBJECTIVE. The purpose of this article is to compare the diagnostic performance of radiologists' overall impression, the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS), and a deep learning algorithm in differentiating benign and malignant thyroid nodules on ultrasound in children and young adults. METHODS. This retrospective study included 139 patients (median age 17.5 years; 119 female patients, 20 male patients) evaluated from January 1, 2004, to September 18, 2020, who were 21 years old and younger with a thyroid nodule on ultrasound with definitive pathologic results from fine-needle aspiration and/or surgical excision to serve as the reference standard. A single nodule per patient was selected, and one transverse and one longitudinal image each of the nodules were extracted for further evaluation. Three radiologists independently characterized nodules on the basis of their overall impression (benign vs malignant) and ACR TI-RADS. A previously developed deep learning algorithm determined for each nodule a likelihood of malignancy, which was used to derive a risk level. Sensitivities and specificities for malignancy were calculated. Agreement was assessed using Cohen kappa coefficients. RESULTS. For radiologists' overall impression, sensitivity ranged from 32.1% to 75.0% (mean, 58.3%; 95% CI, 49.2-67.3%), and specificity ranged from 63.8% to 93.9% (mean, 79.9%; 95% CI, 73.8-85.7%). For ACR TI-RADS, sensitivity ranged from 82.1% to 87.5% (mean, 85.1%; 95% CI, 77.3-92.1%), and specificity ranged from 47.0% to 54.2% (mean, 50.6%; 95% CI, 41.4-59.8%). The deep learning algorithm had a sensitivity of 87.5% (95% CI, 78.3-95.5%) and specificity of 36.1% (95% CI, 25.6-46.8%). Interobserver agreement among pairwise combinations of readers, expressed as kappa, for overall impression was 0.227-0.472 and for ACR TI-RADS was 0.597-0.643. CONCLUSION. Both ACR TI-RADS and the deep learning algorithm had higher sensitivity albeit lower specificity compared with overall impressions. The deep learning algorithm had similar sensitivity but lower specificity than ACR TI-RADS. Interobserver agreement was higher for ACR TI-RADS than for overall impressions. CLINICAL IMPACT. ACR TI-RADS and the deep learning algorithm may serve as potential alternative strategies for guiding decisions to perform fine-needle aspiration of thyroid nodules in children.
- Published
- 2023
- Full Text
- View/download PDF
25. A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis.
- Author
-
Konz N, Buda M, Gu H, Saha A, Yang J, Chledowski J, Park J, Witowski J, Geras KJ, Shoshan Y, Gilboa-Solomon F, Khapun D, Ratner V, Barkan E, Ozery-Flato M, Martí R, Omigbodun A, Marasinou C, Nakhaei N, Hsu W, Sahu P, Hossain MB, Lee J, Santos C, Przelaskowski A, Kalpathy-Cramer J, Bearce B, Cha K, Farahani K, Petrick N, Hadjiiski L, Drukker K, Armato SG 3rd, and Mazurowski MA
- Subjects
- Humans, Female, Benchmarking, Mammography methods, Algorithms, Radiographic Image Interpretation, Computer-Assisted methods, Artificial Intelligence, Breast Neoplasms diagnostic imaging
- Abstract
Importance: An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide., Objectives: To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods., Design, Setting, and Participants: This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22 032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021., Main Outcomes and Measures: The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes., Results: A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926., Conclusions and Relevance: In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.
- Published
- 2023
- Full Text
- View/download PDF
26. The Need for Targeted Labeling of Machine Learning-Based Software as a Medical Device.
- Author
-
Goldstein BA, Mazurowski MA, and Li C
- Subjects
- Humans, Software, Machine Learning
- Published
- 2022
- Full Text
- View/download PDF
27. Artificial Intelligence (AI) Tools for Thyroid Nodules on Ultrasound, From the AJR Special Series on AI Applications.
- Author
-
Wildman-Tobriner B, Taghi-Zadeh E, and Mazurowski MA
- Subjects
- Artificial Intelligence, Humans, Ultrasonography methods, Thyroid Neoplasms, Thyroid Nodule diagnostic imaging
- Abstract
Artificial intelligence (AI) methods for evaluating thyroid nodules on ultrasound have been widely described in the literature, with reported performance of AI tools matching or in some instances surpassing radiologists' performance. As these data have accumulated, products for classification and risk stratification of thyroid nodules on ultrasound have become commercially available. This article reviews FDA-approved products currently on the market, with a focus on product features, reported performance, and considerations for implementation. The products perform risk stratification primarily using a Thyroid Imaging Reporting and Data System (TIRADS), though may provide additional prediction tools independent of TIRADS. Key issues in implementation include integration with radiologist interpretation, impact on workflow and efficiency, and performance monitoring. AI applications beyond nodule classification, including report construction and incidental findings follow-up, are also described. Anticipated future directions of research and development in AI tools for thyroid nodules are highlighted.
- Published
- 2022
- Full Text
- View/download PDF
28. Anomaly Detection of Calcifications in Mammography Based on 11,000 Negative Cases.
- Author
-
Hou R, Peng Y, Grimm LJ, Ren Y, Mazurowski MA, Marks JR, King LM, Maley CC, Hwang ES, and Lo JY
- Subjects
- Diagnosis, Computer-Assisted, Female, Humans, Machine Learning, Mammography methods, Breast Neoplasms diagnostic imaging, Calcinosis diagnostic imaging
- Abstract
In mammography, calcifications are one of the most common signs of breast cancer. Detection of such lesions is an active area of research for computer-aided diagnosis and machine learning algorithms. Due to limited numbers of positive cases, many supervised detection models suffer from overfitting and fail to generalize. We present a one-class, semi-supervised framework using a deep convolutional autoencoder trained with over 50,000 images from 11,000 negative-only cases. Since the model learned from only normal breast parenchymal features, calcifications produced large signals when comparing the residuals between input and reconstruction output images. As a key advancement, a structural dissimilarity index was used to suppress non-structural noises. Our selected model achieved pixel-based AUROC of 0.959 and AUPRC of 0.676 during validation, where calcification masks were defined in a semi-automated process. Although not trained directly on any cancers, detection performance of calcification lesions on 1,883 testing images (645 malignant and 1238 negative) achieved 75% sensitivity at 2.5 false positives per image. Performance plateaued early when trained with only a fraction of the cases, and greater model complexity or a larger dataset did not improve performance. This study demonstrates the potential of this anomaly detection approach to detect mammographic calcifications in a semi-supervised manner with efficient use of a small number of labeled images, and may facilitate new clinical applications such as computer-aided triage and quality improvement.
- Published
- 2022
- Full Text
- View/download PDF
29. Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.
- Author
-
D'Anniballe VM, Tushar FI, Faryna K, Han S, Mazurowski MA, Rubin GD, and Lo JY
- Subjects
- Abdomen, Humans, Neural Networks, Computer, Pelvis diagnostic imaging, Tomography, X-Ray Computed, Deep Learning
- Abstract
Background: There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation., Methods: We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method., Results: Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems., Conclusions: Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
30. Prediction of Upstaging in Ductal Carcinoma in Situ Based on Mammographic Radiomic Features.
- Author
-
Hou R, Grimm LJ, Mazurowski MA, Marks JR, King LM, Maley CC, Lynch T, van Oirsouw M, Rogers K, Stone N, Wallis M, Teuwen J, Wesseling J, Hwang ES, and Lo JY
- Subjects
- Adult, Aged, Aged, 80 and over, Female, Humans, Male, Mammography, Middle Aged, Retrospective Studies, Breast Neoplasms diagnostic imaging, Calcinosis, Carcinoma in Situ, Carcinoma, Ductal, Breast pathology, Carcinoma, Intraductal, Noninfiltrating diagnostic imaging, Carcinoma, Intraductal, Noninfiltrating pathology
- Abstract
Background Improving diagnosis of ductal carcinoma in situ (DCIS) before surgery is important in choosing optimal patient management strategies. However, patients may harbor occult invasive disease not detected until definitive surgery. Purpose To assess the performance and clinical utility of mammographic radiomic features in the prediction of occult invasive cancer among women diagnosed with DCIS on the basis of core biopsy findings. Materials and Methods In this Health Insurance Portability and Accountability Act-compliant retrospective study, digital magnification mammographic images were collected from women who underwent breast core-needle biopsy for calcifications that was performed at a single institution between September 2008 and April 2017 and yielded a diagnosis of DCIS. The database query was directed at asymptomatic women with calcifications without a mass, architectural distortion, asymmetric density, or palpable disease. Logistic regression with regularization was used. Differences across training and internal test set by upstaging rate, age, lesion size, and estrogen and progesterone receptor status were assessed by using the Kruskal-Wallis or χ
2 test. Results The study consisted of 700 women with DCIS (age range, 40-89 years; mean age, 59 years ± 10 [standard deviation]), including 114 with lesions (16.3%) upstaged to invasive cancer at subsequent surgery. The sample was split randomly into 400 women for the training set and 300 for the testing set (mean ages: training set, 59 years ± 10; test set, 59 years ± 10; P = .85). A total of 109 radiomic and four clinical features were extracted. The best model on the test set by using all radiomic and clinical features helped predict upstaging with an area under the receiver operating characteristic curve of 0.71 (95% CI: 0.62, 0.79). For a fixed high sensitivity (90%), the model yielded a specificity of 22%, a negative predictive value of 92%, and an odds ratio of 2.4 (95% CI: 1.8, 3.2). High specificity (90%) corresponded to a sensitivity of 37%, positive predictive value of 41%, and odds ratio of 5.0 (95% CI: 2.8, 9.0). Conclusion Machine learning models that use radiomic features applied to mammographic calcifications may help predict upstaging of ductal carcinoma in situ, which can refine clinical decision making and treatment planning. © RSNA, 2022.- Published
- 2022
- Full Text
- View/download PDF
31. 3D Pyramid Pooling Network for Abdominal MRI Series Classification.
- Author
-
Zhu Z, Mittendorf A, Shropshire E, Allen B, Miller C, Bashir MR, and Mazurowski MA
- Subjects
- Liver, Neural Networks, Computer, Algorithms, Magnetic Resonance Imaging methods
- Abstract
Recognizing and organizing different series in an MRI examination is important both for clinical review and research, but it is poorly addressed by the current generation of picture archiving and communication systems (PACSs) and post-processing workstations. In this paper, we study the problem of using deep convolutional neural networks for automatic classification of abdominal MRI series to one of many series types. Our contributions are three-fold. First, we created a large abdominal MRI dataset containing 3717 MRI series including 188,665 individual images, derived from liver examinations. 30 different series types are represented in this dataset. The dataset was annotated by consensus readings from two radiologists. Both the MRIs and the annotations were made publicly available. Second, we proposed a 3D pyramid pooling network, which can elegantly handle abdominal MRI series with varied sizes of each dimension, and achieved state-of-the-art classification performance. Third, we performed the first ever comparison between the algorithm and the radiologists on an additional dataset and had several meaningful findings.
- Published
- 2022
- Full Text
- View/download PDF
32. Classification of Multiple Diseases on Body CT Scans Using Weakly Supervised Deep Learning.
- Author
-
Tushar FI, D'Anniballe VM, Hou R, Mazurowski MA, Fu W, Samei E, Rubin GD, and Lo JY
- Abstract
Purpose: To design multidisease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports., Materials and Methods: This retrospective study included a total of 12 092 patients (mean age, 57 years ± 18 [standard deviation]; 6172 women) for model development and testing. Rule-based algorithms were used to extract 19 225 disease labels from 13 667 body CT scans performed between 2012 and 2017. Using a three-dimensional DenseVNet, three organ systems were segmented: lungs and pleura, liver and gallbladder, and kidneys and ureters. For each organ system, a three-dimensional convolutional neural network classified each as no apparent disease or for the presence of four common diseases, for a total of 15 different labels across all three models. Testing was performed on a subset of 2158 CT volumes relative to 2875 manually derived reference labels from 2133 patients (mean age, 58 years ± 18; 1079 women). Performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method., Results: Manual validation of the extracted labels confirmed 91%-99% accuracy across the 15 different labels. AUCs for lungs and pleura labels were as follows: atelectasis, 0.77 (95% CI: 0.74, 0.81); nodule, 0.65 (95% CI: 0.61, 0.69); emphysema, 0.89 (95% CI: 0.86, 0.92); effusion, 0.97 (95% CI: 0.96, 0.98); and no apparent disease, 0.89 (95% CI: 0.87, 0.91). AUCs for liver and gallbladder were as follows: hepatobiliary calcification, 0.62 (95% CI: 0.56, 0.67); lesion, 0.73 (95% CI: 0.69, 0.77); dilation, 0.87 (95% CI: 0.84, 0.90); fatty, 0.89 (95% CI: 0.86, 0.92); and no apparent disease, 0.82 (95% CI: 0.78, 0.85). AUCs for kidneys and ureters were as follows: stone, 0.83 (95% CI: 0.79, 0.87); atrophy, 0.92 (95% CI: 0.89, 0.94); lesion, 0.68 (95% CI: 0.64, 0.72); cyst, 0.70 (95% CI: 0.66, 0.73); and no apparent disease, 0.79 (95% CI: 0.75, 0.83)., Conclusion: Weakly supervised deep learning models were able to classify diverse diseases in multiple organ systems from CT scans. Keywords: CT, Diagnosis/Classification/Application Domain, Semisupervised Learning, Whole-Body Imaging© RSNA, 2022., Competing Interests: Disclosures of Conflicts of Interest: F.I.T. No relevant relationships. V.M.D. No relevant relationships. R.H. No relevant relationships. M.A.M. No relevant relationships. W.F. No relevant relationships. E.S. No relevant relationships. G.D.R. No relevant relationships. J.Y.L. MAIA Erasmus and University of Girona fellowship covered part of F.I.T.'s graduate stipend while he was a visiting scholar; NVIDIA GPU card given to the laboratory., (2022 by the Radiological Society of North America, Inc.)
- Published
- 2021
- Full Text
- View/download PDF
33. Normalization of breast MRIs using cycle-consistent generative adversarial networks.
- Author
-
Modanwal G, Vellal A, and Mazurowski MA
- Subjects
- Algorithms, Humans, Mammography, X-Rays, Image Processing, Computer-Assisted, Magnetic Resonance Imaging
- Abstract
Objectives: Dynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI) is widely used to complement ultrasound examinations and x-ray mammography for early detection and diagnosis of breast cancer. However, images generated by various MRI scanners (e.g., GE Healthcare, and Siemens) differ both in intensity and noise distribution, preventing algorithms trained on MRIs from one scanner to generalize to data from other scanners. In this work, we propose a method to solve this problem by normalizing images between various scanners., Methods: MRI normalization is challenging because it requires normalizing intensity values and mapping noise distributions between scanners. We utilize a cycle-consistent generative adversarial network to learn a bidirectional mapping and perform normalization between MRIs produced by GE Healthcare and Siemens scanners in an unpaired setting. Initial experiments demonstrate that the traditional CycleGAN architecture struggles to preserve the anatomical structures of the breast during normalization. Thus, we propose two technical innovations in order to preserve both the shape of the breast as well as the tissue structures within the breast. First, we incorporate mutual information loss during training in order to ensure anatomical consistency. Second, we propose a modified discriminator architecture that utilizes a smaller field-of-view to ensure the preservation of finer details in the breast tissue., Results: Quantitative and qualitative evaluations show that the second innovation consistently preserves the breast shape and tissue structures while also performing the proper intensity normalization and noise distribution mapping., Conclusion: Our results demonstrate that the proposed model can successfully learn a bidirectional mapping and perform normalization between MRIs produced by different vendors, potentially enabling improved diagnosis and detection of breast cancer. All the data used in this study are publicly available at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226903., (Copyright © 2021. Published by Elsevier B.V.)
- Published
- 2021
- Full Text
- View/download PDF
34. A Data Set and Deep Learning Algorithm for the Detection of Masses and Architectural Distortions in Digital Breast Tomosynthesis Images.
- Author
-
Buda M, Saha A, Walsh R, Ghate S, Li N, Swiecicki A, Lo JY, and Mazurowski MA
- Subjects
- Aged, Breast diagnostic imaging, False Positive Reactions, Female, Humans, Middle Aged, ROC Curve, Reproducibility of Results, Breast Neoplasms diagnosis, Datasets as Topic, Deep Learning, Early Detection of Cancer methods, Mammography
- Abstract
Importance: Breast cancer screening is among the most common radiological tasks, with more than 39 million examinations performed each year. While it has been among the most studied medical imaging applications of artificial intelligence, the development and evaluation of algorithms are hindered by the lack of well-annotated, large-scale publicly available data sets., Objectives: To curate, annotate, and make publicly available a large-scale data set of digital breast tomosynthesis (DBT) images to facilitate the development and evaluation of artificial intelligence algorithms for breast cancer screening; to develop a baseline deep learning model for breast cancer detection; and to test this model using the data set to serve as a baseline for future research., Design, Setting, and Participants: In this diagnostic study, 16 802 DBT examinations with at least 1 reconstruction view available, performed between August 26, 2014, and January 29, 2018, were obtained from Duke Health System and analyzed. From the initial cohort, examinations were divided into 4 groups and split into training and test sets for the development and evaluation of a deep learning model. Images with foreign objects or spot compression views were excluded. Data analysis was conducted from January 2018 to October 2020., Exposures: Screening DBT., Main Outcomes and Measures: The detection algorithm was evaluated with breast-based free-response receiver operating characteristic curve and sensitivity at 2 false positives per volume., Results: The curated data set contained 22 032 reconstructed DBT volumes that belonged to 5610 studies from 5060 patients with a mean (SD) age of 55 (11) years and 5059 (100.0%) women. This included 4 groups of studies: (1) 5129 (91.4%) normal studies; (2) 280 (5.0%) actionable studies, for which where additional imaging was needed but no biopsy was performed; (3) 112 (2.0%) benign biopsied studies; and (4) 89 studies (1.6%) with cancer. Our data set included masses and architectural distortions that were annotated by 2 experienced radiologists. Our deep learning model reached breast-based sensitivity of 65% (39 of 60; 95% CI, 56%-74%) at 2 false positives per DBT volume on a test set of 460 examinations from 418 patients., Conclusions and Relevance: The large, diverse, and curated data set presented in this study could facilitate the development and evaluation of artificial intelligence algorithms for breast cancer screening by providing data for training as well as a common set of cases for model validation. The performance of the model developed in this study showed that the task remains challenging; its performance could serve as a baseline for future model development.
- Published
- 2021
- Full Text
- View/download PDF
35. Deep learning-based algorithm for assessment of knee osteoarthritis severity in radiographs matches performance of radiologists.
- Author
-
Swiecicki A, Li N, O'Donnell J, Said N, Yang J, Mather RC, Jiranek WA, and Mazurowski MA
- Subjects
- Algorithms, Humans, Radiologists, Reproducibility of Results, Deep Learning, Osteoarthritis, Knee diagnostic imaging
- Abstract
A fully-automated deep learning algorithm matched performance of radiologists in assessment of knee osteoarthritis severity in radiographs using the Kellgren-Lawrence grading system., Purpose: To develop an automated deep learning-based algorithm that jointly uses Posterior-Anterior (PA) and Lateral (LAT) views of knee radiographs to assess knee osteoarthritis severity according to the Kellgren-Lawrence grading system., Materials and Methods: We used a dataset of 9739 exams from 2802 patients from Multicenter Osteoarthritis Study (MOST). The dataset was divided into a training set of 2040 patients, a validation set of 259 patients and a test set of 503 patients. A novel deep learning-based method was utilized for assessment of knee OA in two steps: (1) localization of knee joints in the images, (2) classification according to the KL grading system. Our method used both PA and LAT views as the input to the model. The scores generated by the algorithm were compared to the grades provided in the MOST dataset for the entire test set as well as grades provided by 5 radiologists at our institution for a subset of the test set., Results: The model obtained a multi-class accuracy of 71.90% on the entire test set when compared to the ratings provided in the MOST dataset. The quadratic weighted Kappa coefficient for this set was 0.9066. The average quadratic weighted Kappa between all pairs of radiologists from our institution who took part in the study was 0.748. The average quadratic-weighted Kappa between the algorithm and the radiologists at our institution was 0.769., Conclusion: The proposed model performed demonstrated equivalency of KL classification to MSK radiologists, but clearly superior reproducibility. Our model also agreed with radiologists at our institution to the same extent as the radiologists with each other. The algorithm could be used to provide reproducible assessment of knee osteoarthritis severity., (Copyright © 2021 Elsevier Ltd. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
36. A generative adversarial network-based abnormality detection using only normal images for model training with application to digital breast tomosynthesis.
- Author
-
Swiecicki A, Konz N, Buda M, and Mazurowski MA
- Subjects
- Female, Humans, Middle Aged, Radiographic Image Enhancement methods, Radiographic Image Interpretation, Computer-Assisted methods, Breast diagnostic imaging, Breast Neoplasms diagnostic imaging, Computer Simulation, Mammography methods, Neural Networks, Computer
- Abstract
Deep learning has shown tremendous potential in the task of object detection in images. However, a common challenge with this task is when only a limited number of images containing the object of interest are available. This is a particular issue in cancer screening, such as digital breast tomosynthesis (DBT), where less than 1% of cases contain cancer. In this study, we propose a method to train an inpainting generative adversarial network to be used for cancer detection using only images that do not contain cancer. During inference, we removed a part of the image and used the network to complete the removed part. A significant error in completing an image part was considered an indication that such location is unexpected and thus abnormal. A large dataset of DBT images used in this study was collected at Duke University. It consisted of 19,230 reconstructed volumes from 4348 patients. Cancerous masses and architectural distortions were marked with bounding boxes by radiologists. Our experiments showed that the locations containing cancer were associated with a notably higher completion error than the non-cancer locations (mean error ratio of 2.77). All data used in this study has been made publicly available by the authors.
- Published
- 2021
- Full Text
- View/download PDF
37. Do We Expect More from Radiology AI than from Radiologists?
- Author
-
Mazurowski MA
- Abstract
The expectations of radiology artificial intelligence do not match expectations of radiologists in terms of performance and explainability., Competing Interests: Disclosures of Conflicts of Interest: M.A.M. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: disclosed no relevant relationships. Other relationships: institution has U.S. patent application 15/209,212 (systems and methods for extracting prognostic image features)., (2021 by the Radiological Society of North America, Inc.)
- Published
- 2021
- Full Text
- View/download PDF
38. Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes.
- Author
-
Draelos RL, Dov D, Mazurowski MA, Lo JY, Henao R, Rubin GD, and Carin L
- Subjects
- Humans, Neural Networks, Computer, Radiography, Tomography, X-Ray Computed, Lung Diseases, Machine Learning
- Abstract
Machine learning models for radiology benefit from large-scale data sets with high quality labels for abnormalities. We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. This is the largest multiply-annotated volumetric medical imaging data set reported. To annotate this data set, we developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports with an average F-score of 0.976 (min 0.941, max 1.0). We also developed a model for multi-organ, multi-disease classification of chest CT volumes that uses a deep convolutional neural network (CNN). This model reached a classification performance of AUROC >0.90 for 18 abnormalities, with an average AUROC of 0.773 for all 83 abnormalities, demonstrating the feasibility of learning from unfiltered whole volume CT data. We show that training on more labels improves performance significantly: for a subset of 9 labels - nodule, opacity, atelectasis, pleural effusion, consolidation, mass, pericardial effusion, cardiomegaly, and pneumothorax - the model's average AUROC increased by 10% when the number of training labels was increased from 9 to all 83. All code for volume preprocessing, automated label extraction, and the volume abnormality prediction model is publicly available. The 36,316 CT volumes and labels will also be made publicly available pending institutional approval., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2020 Elsevier B.V. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
39. Performance of preoperative breast MRI based on breast cancer molecular subtype.
- Author
-
Devalapalli A, Thomas S, Mazurowski MA, Saha A, and Grimm LJ
- Subjects
- Adult, Aged, Biopsy, Breast pathology, Breast Neoplasms pathology, Female, Humans, Mastectomy, Mastectomy, Segmental, Middle Aged, Receptor, ErbB-2, Receptors, Estrogen, Receptors, Progesterone, Breast Neoplasms diagnostic imaging, Magnetic Resonance Imaging
- Abstract
Purpose: To assess the performance of preoperative breast MRI biopsy recommendations based on breast cancer molecular subtype., Methods: All preoperative breast MRIs at a single academic medical center from May 2010 to March 2014 were identified. Reports were reviewed for biopsy recommendations. All pathology reports were reviewed to determine biopsy recommendation outcomes. Molecular subtypes were defined as Luminal A (ER/PR+ and HER2-), Luminal B (ER/PR+ and HER2+), HER2 (ER-, PR- and HER2+), and Basal (ER-, PR-, and HER2-). Logistic regression assessed the probability of true positive versus false positive biopsy and mastectomy versus lumpectomy., Results: There were 383 patients included with a molecular subtype distribution of 253 Luminal A, 44 Luminal B, 20 HER2, and 66 Basal. Two hundred and thirteen (56%) patients and 319 sites were recommended for biopsy. Molecular subtype did not influence the recommendation for biopsy (p = 0.69) or the number of biopsy site recommendations (p = 0.30). The positive predictive value for a biopsy recommendation was 42% overall and 46% for Luminal A, 43% for Luminal B, 36% for HER2, and 29% for Basal subtype cancers. The multivariate logistic regression model showed no difference in true positive biopsy rate based on molecular subtype (p = 0.78). Fifty-one percent of patients underwent mastectomy and the multivariate model demonstrated that only a true positive biopsy (odds ratio: 5.3) was associated with higher mastectomy rates., Conclusion: Breast cancer molecular subtype did not influence biopsy recommendations, positive predictive values, or surgical approaches. Only true positive biopsies increased the mastectomy rate., (Copyright © 2020. Published by Elsevier Inc.)
- Published
- 2020
- Full Text
- View/download PDF
40. Using the American College of Radiology Thyroid Imaging Reporting and Data System at the Point of Care: Sonographer Performance and Interobserver Variability.
- Author
-
Wildman-Tobriner B, Ahmed S, Erkanli A, Mazurowski MA, and Hoang JK
- Subjects
- Adult, Aged, Aged, 80 and over, Humans, Middle Aged, Observer Variation, Retrospective Studies, Societies, Medical standards, Thyroid Neoplasms diagnosis, Thyroid Neoplasms diagnostic imaging, Thyroid Nodule diagnosis, Young Adult, Thyroid Gland diagnostic imaging, Thyroid Nodule diagnostic imaging, Ultrasonography standards
- Abstract
The purpose of this study was to assess inter-observer variability and performance when sonographers assign features to thyroid nodules on ultrasound using the American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Fifteen sonographers retrospectively evaluated 100 thyroid nodules and assigned features to each nodule according to ACR TI-RADS lexicon. Ratings were compared with one another and to a gold standard using Fleiss' and Cohen's kappa statistics, respectively. Sonographers were also asked subjective questions regarding their comfort level assessing each feature, and opinions were compared with performance using a mixed effects model. Sonographers demonstrated only slight agreement for margin (κ = 0.18, 95% confidence interval [CI]: 0.16-0.20) and large comet tail artifact (κ = 0.08, 95% CI: 0.06-0.10) but better performance for macrocalcification (κ = 0.41, 95% CI: 0.39-0.43) and no echogenic foci (κ = 0.52, 95% CI: 0.50-0.54). Sonographer comfort level with different feature assignments did not statistically correlate with performance for a given feature. In conclusion, sonographers using ACR TI-RADS to assign thyroid nodule features on ultrasound demonstrate a range of agreement across features, with margin and large comet tail artifact showing the most variability. These results highlight potential areas of focus for sonographer education efforts as ACR TI-RADS continues to be implemented in radiology departments., (Copyright © 2020 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.)
- Published
- 2020
- Full Text
- View/download PDF
41. Prediction of Upstaged Ductal Carcinoma In Situ Using Forced Labeling and Domain Adaptation.
- Author
-
Hou R, Mazurowski MA, Grimm LJ, Marks JR, King LM, Maley CC, Hwang ES, and Lo JY
- Subjects
- Breast diagnostic imaging, Female, Humans, Mammography, ROC Curve, Retrospective Studies, Breast Neoplasms diagnostic imaging, Carcinoma, Intraductal, Noninfiltrating diagnostic imaging
- Abstract
Objective: The goal of this study is to use adjunctive classes to improve a predictive model whose performance is limited by the common problems of small numbers of primary cases, high feature dimensionality, and poor class separability. Specifically, our clinical task is to use mammographic features to predict whether ductal carcinoma in situ (DCIS) identified at needle core biopsy will be later upstaged or shown to contain invasive breast cancer., Methods: To improve the prediction of pure DCIS (negative) versus upstaged DCIS (positive) cases, this study considers the adjunctive roles of two related classes: atypical ductal hyperplasia (ADH), a non-cancer type of breast abnormity, and invasive ductal carcinoma (IDC), with 113 computer vision based mammographic features extracted from each case. To improve the baseline Model A's classification of pure vs. upstaged DCIS, we designed three different strategies (Models B, C, D) with different ways of embedding features or inputs., Results: Based on ROC analysis, the baseline Model A performed with AUC of 0.614 (95% CI, 0.496-0.733). All three new models performed better than the baseline, with domain adaptation (Model D) performing the best with an AUC of 0.697 (95% CI, 0.595-0.797)., Conclusion: We improved the prediction performance of DCIS upstaging by embedding two related pathology classes in different training phases., Significance: The three new strategies of embedding related class data all outperformed the baseline model, thus demonstrating not only feature similarities among these different classes, but also the potential for improving classification by using other related classes.
- Published
- 2020
- Full Text
- View/download PDF
42. Deep Learning-Based Segmentation of Nodules in Thyroid Ultrasound: Improving Performance by Utilizing Markers Present in the Images.
- Author
-
Buda M, Wildman-Tobriner B, Castor K, Hoang JK, and Mazurowski MA
- Subjects
- Humans, Retrospective Studies, Ultrasonography methods, Deep Learning, Thyroid Nodule diagnostic imaging
- Abstract
Computer-aided segmentation of thyroid nodules in ultrasound imaging could assist in their accurate characterization. In this study, using data for 1278 nodules, we proposed and evaluated two methods for deep learning-based segmentation of thyroid nodules that utilize calipers present in the images. The first method used approximate nodule masks generated based on the calipers. The second method combined manual annotations with automatic guidance by the calipers. When only approximate nodule masks were used for training, the achieved Dice similarity coefficient (DSC) was 85.1%. The performance of a network trained using manual annotations was DSC = 90.4%. When the guidance by the calipers was added, the performance increased to DSC = 93.1%. An increase in the number of cases used for training resulted in increased performance for all methods. The proposed method utilizing the guidance by calipers matched the performance of the network that did not use it with a reduced number of manually annotated training cases., (Copyright © 2019 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.)
- Published
- 2020
- Full Text
- View/download PDF
43. Deep Radiogenomics of Lower-Grade Gliomas: Convolutional Neural Networks Predict Tumor Genomic Subtypes Using MR Images.
- Author
-
Buda M, AlBadawy EA, Saha A, and Mazurowski MA
- Abstract
Purpose: To employ deep learning to predict genomic subtypes of lower-grade glioma (LLG) tumors based on their appearance at MRI., Materials and Methods: Imaging data from The Cancer Imaging Archive and genomic data from The Cancer Genome Atlas from 110 patients from five institutions with lower-grade gliomas (World Health Organization grade II and III) were used in this study. A convolutional neural network was trained to predict tumor genomic subtype based on the MRI of the tumor. Two different deep learning approaches were tested: training from random initialization and transfer learning. Deep learning models were pretrained on glioblastoma MRI, instead of natural images, to determine if performance was improved for the detection of LGGs. The models were evaluated using area under the receiver operating characteristic curve (AUC) with cross-validation. Imaging data and annotations used in this study are publicly available., Results: The best performing model was based on transfer learning from glioblastoma MRI. It achieved AUC of 0.730 (95% confidence interval [CI]: 0.605, 0.844) for discriminating cluster-of-clusters 2 from others. For the same task, a network trained from scratch achieved an AUC of 0.680 (95% CI: 0.538, 0.811), whereas a model pretrained on natural images achieved an AUC of 0.640 (95% CI: 0.521, 0.763)., Conclusion: These findings show the potential of utilizing deep learning to identify relationships between cancer imaging and cancer genomics in LGGs. However, more accurate models are needed to justify clinical use of such tools, which might be obtained using substantially larger training datasets. Supplemental material is available for this article. © RSNA, 2020., Competing Interests: Disclosures of Conflicts of Interest: M.B. disclosed no relevant relationships. E.A.A. disclosed no relevant relationships. A.S. disclosed no relevant relationships. M.A.M. Activities related to the present article: advisory role with Gradient Health. Activities not related to the present article: institution has received grant money from Bracco Diagnostics. Other relationships: disclosed no relevant relationships., (2020 by the Radiological Society of North America, Inc.)
- Published
- 2020
- Full Text
- View/download PDF
44. Artificial Intelligence in Radiology: Some Ethical Considerations for Radiologists and Algorithm Developers.
- Author
-
Mazurowski MA
- Subjects
- Algorithms, Forecasting, Humans, Radiologists, Artificial Intelligence, Radiology
- Abstract
As artificial intelligence (AI) is finding its place in radiology, it is important to consider how to guide the research and clinical implementation in a way that will be most beneficial to patients. Although there are multiple aspects of this issue, I consider a specific one: a potential misalignment of the self-interests of radiologists and AI developers with the best interests of the patients. Radiologists know that supporting research into AI and advocating for its adoption in clinical settings could diminish their employment opportunities and reduce respect for their profession. This provides an incentive to oppose AI in various ways. AI developers have an incentive to hype their discoveries to gain attention. This could provide short-term personal gains, however, it could also create a distrust toward the field if it became apparent that the state of the art was far from where it was promised to be. The future research and clinical implementation of AI in radiology will be partially determined by radiologist and AI researchers. Therefore, it is very important that we recognize our own personal motivations and biases and act responsibly to ensure the highest benefit of the AI transformation to the patients., (Copyright © 2019. Published by Elsevier Inc.)
- Published
- 2020
- Full Text
- View/download PDF
45. Breast Cancer Radiogenomics: Current Status and Future Directions.
- Author
-
Grimm LJ and Mazurowski MA
- Subjects
- Humans, Magnetic Resonance Imaging, Breast Neoplasms diagnostic imaging, Breast Neoplasms genetics, Genomics
- Abstract
Radiogenomics is an area of research that aims to identify associations between imaging phenotypes ("radio-") and tumor genome ("-genomics"). Breast cancer radiogenomics research in particular has been an especially prolific area of investigation in recent years as evidenced by the wide number and variety of publications and conferences presentations. To date, research has primarily been focused on dynamic contrast enhanced pre-operative breast MRI and breast cancer molecular subtypes, but investigations have extended to all breast imaging modalities as well as multiple additional genetic markers including those that are commercially available. Furthermore, both human and computer-extracted features as well as deep learning techniques have been explored. This review will summarize the specific imaging modalities used in radiogenomics analysis, describe the methods of extracting imaging features, and present the types of genomics, molecular, and related information used for analysis. Finally, the limitations and future directions of breast cancer radiogenomics research will be discussed., (Copyright © 2019 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.)
- Published
- 2020
- Full Text
- View/download PDF
46. Deep learning analysis of breast MRIs for prediction of occult invasive disease in ductal carcinoma in situ.
- Author
-
Zhu Z, Harowicz M, Zhang J, Saha A, Grimm LJ, Hwang ES, and Mazurowski MA
- Subjects
- Adult, Aged, Biopsy, Fine-Needle, Breast Neoplasms pathology, Carcinoma, Ductal, Breast pathology, Female, Humans, Middle Aged, Predictive Value of Tests, Retrospective Studies, Breast Neoplasms diagnostic imaging, Carcinoma, Ductal, Breast diagnostic imaging, Deep Learning, Magnetic Resonance Imaging, Support Vector Machine
- Abstract
Purpose: To determine whether deep learning-based algorithms applied to breast MR images can aid in the prediction of occult invasive disease following the diagnosis of ductal carcinoma in situ (DCIS) by core needle biopsy., Materials and Methods: Our study is retrospective. The data was collected from 2000 to 2014. In this institutional review board-approved study, we analyzed dynamic contrast-enhanced fat-saturated T1-weighted MRI sequences from 131 patients with a core needle biopsy-confirmed diagnosis of DCIS. We explored two different deep learning approaches to predict whether there was an occult invasive component in the analyzed tumors that was ultimately identified at surgical excision. In the first approach, we adopted the transfer learning strategy. Specifically, we used the pre-trained GoogleNet. In the second approach, we used a pre-trained network to extract deep features, and a support vector machine (SVM) that utilizes these features to predict the upstaging of DCIS. We used nested 10-fold cross validation and the area under the ROC curve (AUC) to estimate the performance of the predictive models., Results: The best classification performance was obtained using the deep features approach with GoogleNet model pre-trained on ImageNet as the feature extractor and a polynomial kernel SVM used as the classifier (AUC = 0.70, 95% CI: 0.58-0.79). For the transfer learning based approach, the highest AUC obtained was 0.68 (95% CI: 0.57-0.77)., Conclusions: Convolutional neural networks might be used to identify occult invasive disease in patients diagnosed with DCIS by core needle biopsy., (Copyright © 2019 Elsevier Ltd. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
47. Management of Thyroid Nodules Seen on US Images: Deep Learning May Match Performance of Radiologists.
- Author
-
Buda M, Wildman-Tobriner B, Hoang JK, Thayer D, Tessler FN, Middleton WD, and Mazurowski MA
- Subjects
- Female, Humans, Male, Middle Aged, Reproducibility of Results, Retrospective Studies, Sensitivity and Specificity, Thyroid Gland diagnostic imaging, Deep Learning, Image Interpretation, Computer-Assisted methods, Thyroid Nodule diagnostic imaging, Ultrasonography methods
- Abstract
BackgroundManagement of thyroid nodules may be inconsistent between different observers and time consuming for radiologists. An artificial intelligence system that uses deep learning may improve radiology workflow for management of thyroid nodules.PurposeTo develop a deep learning algorithm that uses thyroid US images to decide whether a thyroid nodule should undergo a biopsy and to compare the performance of the algorithm with the performance of radiologists who adhere to American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS).Materials and MethodsIn this retrospective analysis, studies in patients referred for US with subsequent fine-needle aspiration or with surgical histologic analysis used as the standard were evaluated. The study period was from August 2006 to May 2010. A multitask deep convolutional neural network was trained to provide biopsy recommendations for thyroid nodules on the basis of two orthogonal US images as the input. In the training phase, the deep learning algorithm was first evaluated by using 10-fold cross-validation. Internal validation was then performed on an independent set of 99 consecutive nodules. The sensitivity and specificity of the algorithm were compared with a consensus of three ACR TI-RADS committee experts and nine other radiologists, all of whom interpreted thyroid US images in clinical practice.ResultsIncluded were 1377 thyroid nodules in 1230 patients with complete imaging data and conclusive cytologic or histologic diagnoses. For the 99 test nodules, the proposed deep learning algorithm achieved 13 of 15 (87%: 95% confidence interval [CI]: 67%, 100%) sensitivity, the same as expert consensus ( P > .99) and higher than five of nine radiologists. The specificity of the deep learning algorithm was 44 of 84 (52%; 95% CI: 42%, 62%), which was similar to expert consensus (43 of 84; 51%; 95% CI: 41%, 62%; P = .91) and higher than seven of nine other radiologists. The mean sensitivity and specificity for the nine radiologists was 83% (95% CI: 64%, 98%) and 48% (95% CI: 37%, 59%), respectively.ConclusionSensitivity and specificity of a deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists who used American College of Radiology Thyroid Imaging and Reporting Data System guidelines.© RSNA, 2019 Online supplemental material is available for this article.
- Published
- 2019
- Full Text
- View/download PDF
48. Artificial Intelligence May Cause a Significant Disruption to the Radiology Workforce.
- Author
-
Mazurowski MA
- Subjects
- Forecasting, Humans, Artificial Intelligence, Radiology trends, Workforce trends
- Abstract
The increasingly realistic prospect of artificial intelligence (AI) playing an important role in radiology has been welcomed with a mixture of enthusiasm and anxiousness. A consensus has arisen that AI will support radiologists in the interpretation of less challenging cases, which will give the radiologists more time to focus on the challenging tasks as well as interactions with patients and other clinicians. The possibility of AI replacing a large number of radiologists is generally dismissed by the radiology community. The common arguments include the following: (1) AI will never be able to match radiologists' performance; (2) radiologists do more than interpret images; (3) even if AI takes over a large portion of the reading tasks, the radiologists' effort will be shifted toward interactions with patients and other physicians; (4) the FDA would never agree to let machines do the work of radiologist; (5) the issues of legal liability would be insurmountable; and (6) patients would never put complete trust in computer algorithms. In this article, I analyze these arguments in detail. I find a certain level of validity to some of them. However, I conclude that none of the arguments provide sufficient support for the claim that AI will not create a significant disruption in the radiology workforce. Such disruption is a real possibility. Although the radiology specialty has shown an astonishing ability to adapt to the changing technology, the future is uncertain, and an honest, in-depth discussion is needed to guide development of the field., (Copyright © 2019. Published by Elsevier Inc.)
- Published
- 2019
- Full Text
- View/download PDF
49. Machine learning-based prediction of future breast cancer using algorithmically measured background parenchymal enhancement on high-risk screening MRI.
- Author
-
Saha A, Grimm LJ, Ghate SV, Kim CE, Soo MS, Yoon SC, and Mazurowski MA
- Subjects
- Adult, Aged, Algorithms, Breast diagnostic imaging, Case-Control Studies, Female, Humans, Middle Aged, Predictive Value of Tests, Breast Neoplasms diagnostic imaging, Image Interpretation, Computer-Assisted methods, Machine Learning, Magnetic Resonance Imaging methods
- Abstract
Background: Preliminary work has demonstrated that background parenchymal enhancement (BPE) assessed by radiologists is predictive of future breast cancer in women undergoing high-risk screening MRI. Algorithmically assessed measures of BPE offer a more precise and reproducible means of measuring BPE than human readers and thus might improve the predictive performance of future cancer development., Purpose: To determine if algorithmically extracted imaging features of BPE on screening breast MRI in high-risk women are associated with subsequent development of cancer., Study Type: Case-control study., Population: In all, 133 women at high risk for developing breast cancer; 46 of these patients developed breast cancer subsequently over a follow-up period of 2 years., Field Strength/sequence: 5 T or 3.0 T T
1 -weighted precontrast fat-saturated and nonfat-saturated sequences and postcontrast nonfat-saturated sequences., Assessment: Automatic features of BPE were extracted with a computer algorithm. Subjective BPE scores from five breast radiologists (blinded to clinical outcomes) were also available., Statistical Tests: Leave-one-out crossvalidation for a multivariate logistic regression model developed using the automatic features and receiver operating characteristic (ROC) analysis were performed to calculate the area under the curve (AUC). Comparison of automatic features and subjective features was performed using a generalized regression model and the P-value was obtained. Odds ratios for automatic and subjective features were compared., Results: The multivariate model discriminated patients who developed cancer from the patients who did not, with an AUC of 0.70 (95% confidence interval: 0.60-0.79, P < 0.001). The imaging features remained independently predictive of subsequent development of cancer (P < 0.003) when compared with the subjective BPE assessment of the readers., Data Conclusion: Automatically extracted BPE measurements may potentially be used to further stratify risk in patients undergoing high-risk screening MRI., Level of Evidence: 3 Technical Efficacy: Stage 5 J. Magn. Reson. Imaging 2019;50:456-464., (© 2019 International Society for Magnetic Resonance in Medicine.)- Published
- 2019
- Full Text
- View/download PDF
50. Using Artificial Intelligence to Revise ACR TI-RADS Risk Stratification of Thyroid Nodules: Diagnostic Accuracy and Utility.
- Author
-
Wildman-Tobriner B, Buda M, Hoang JK, Middleton WD, Thayer D, Short RG, Tessler FN, and Mazurowski MA
- Subjects
- Adolescent, Adult, Aged, Aged, 80 and over, Female, Humans, Male, Middle Aged, Reproducibility of Results, Retrospective Studies, Risk Assessment, Sensitivity and Specificity, Societies, Medical, Thyroid Gland diagnostic imaging, United States, Young Adult, Artificial Intelligence, Diagnostic Imaging methods, Image Interpretation, Computer-Assisted methods, Radiology Information Systems, Thyroid Nodule diagnostic imaging
- Abstract
Background Risk stratification systems for thyroid nodules are often complicated and affected by low specificity. Continual improvement of these systems is necessary to reduce the number of unnecessary thyroid biopsies. Purpose To use artificial intelligence (AI) to optimize the American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Materials and Methods A total of 1425 biopsy-proven thyroid nodules from 1264 consecutive patients (1026 women; mean age, 52.9 years [range, 18-93 years]) were evaluated retrospectively. Expert readers assigned points based on five ACR TI-RADS categories (composition, echogenicity, shape, margin, echogenic foci), and a genetic AI algorithm was applied to a training set (1325 nodules). Point and pathologic data were used to create an optimized scoring system (hereafter, AI TI-RADS). Performance of the systems was compared by using a test set of the final 100 nodules with interpretations from the expert reader, eight nonexpert readers, and an expert panel. Initial performance of AI TI-RADS was calculated by using a test for differences between binomial proportions. Additional comparisons across readers were conducted by using bootstrapping; diagnostic performance was assessed by using area under the receiver operating curve. Results AI TI-RADS assigned new point values for eight ACR TI-RADS features. Six features were assigned zero points, which simplified categorization. By using expert reader data, the diagnostic performance of ACR TI-RADS and AI TI-RADS was area under the receiver operating curve of 0.91 and 0.93, respectively. For the same expert, specificity of AI TI-RADS (65%, 55 of 85) was higher ( P < .001) than that of ACR TI-RADS (47%, 40 of 85). For the eight nonexpert radiologists, mean specificity for AI TI-RADS (55%) was also higher ( P < .001) than that of ACR TI-RADS (48%). An interactive AI TI-RADS calculator can be viewed at http://deckard.duhs.duke.edu/∼ai-ti-rads . Conclusion An artificial intelligence-optimized Thyroid Imaging Reporting and Data System (TI-RADS) validates the American College of Radiology TI-RADS while slightly improving specificity and maintaining sensitivity. Additionally, it simplifies feature assignments, which may improve ease of use. © RSNA, 2019 Online supplemental material is available for this article.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.