Author: "Gur D" / Journal: academic radiology - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Gur D"' showing total 67 results

Start Over Author "Gur D" Journal academic radiology

67 results on '"Gur D"'

1. Computer-aided detection; the effect of training databases on detection of subtle breast masses.

Author: Zheng B, Wang X, Lederman D, Tan J, Gur D, Zheng, Bin, Wang, Xingwei, Lederman, Dror, Tan, Jun, and Gur, David
Abstract: Rationale and Objectives: Lesion conspicuity is typically highly correlated with visual difficulty for lesion detection, and computer-aided detection (CAD) has been widely used as a "second reader" in mammography. Hence, increasing CAD sensitivity in detecting subtle cancers without increasing false-positive rates is important. The aim of this study was to investigate the effect of training database case selection on CAD performance in detecting low-conspicuity breast masses.Materials and Methods: A full-field digital mammographic image database that included 525 cases depicting malignant masses was randomly partitioned into three subsets. A CAD scheme was applied to detect all initially suspected mass regions and compute region conspicuity. Training samples were iteratively selected from two of the subsets. Four types of training data sets-(1) one including all available true-positive mass regions in the two subsets ("all"), (2) one including 350 randomly selected mass regions ("diverse"), (3) one including 350 high-conspicuity mass regions ("easy"), and (4) one including 350 low-conspicuity mass regions ("difficult")-were assembled. In each training data set, the same number of randomly selected false-positive regions as the true-positives were also included. Two classifiers, an artificial neural network (ANN) and a k-nearest neighbor (KNN) algorithm, were trained using each of the four training data sets and tested on all suspected regions in the remaining data set. Using a threefold cross-validation method, the performance changes of the CAD schemes trained using one of the four training data sets were computed and compared.Results: CAD initially detected 1025 true-positive mass regions depicted on 507 cases (97% case-based sensitivity) and 9569 false-positive regions (3.5 per image) in the entire database. Using the all training data set, CAD achieved the highest overall performance on the entire testing database. However, CAD detected the highest number of low-conspicuity masses when the difficult training data set was used for training. Results did agree for both ANN-based and KNN-based classifiers in all tests. Compared to the use of the all training data set, the sensitivity of the schemes trained using the difficult data set decreased by 8.6% and 8.4% for the ANN and KNN algorithm on the entire database, respectively, but the detection of low-conspicuity masses increased by 7.1% and 15.1% for the ANN and KNN algorithm at a false-positive rate of 0.3 per image.Conclusions: CAD performance depends on the size, diversity, and difficulty level of the training database. To increase CAD sensitivity in detecting subtle cancer, one should increase the fraction of difficult cases in the training database rather than simply increasing the training data set size. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

2. Matching breast masses depicted on different views a comparison of three methods.

Author: Zheng B, Tan J, Ganott MA, Chough DM, Gur D, Zheng, Bin, Tan, Jun, Ganott, Marie A, Chough, Denise M, and Gur, David
Abstract: Rationale and Objectives: Computerized determination of optimal search areas on mammograms for matching breast mass regions depicted on two ipsilateral views remains a challenge for developing multiview-based computer-aided detection (CAD) schemes. The purpose of this study was to compare three methods aimed at matching CAD-cued mass regions depicted on two views and the associated impact on CAD performance.Materials and Methods: The three search methods used (1) an annular (fan-shaped) band, (2) a straight strip perpendicular to the estimated centerline, and (3) a mixed search area bound on the chest wall side by a straight line and an annular arc on the nipple side, respectively. An image database of 200 examinations with positive results depicting the masses on two views and 200 examinations with negative results was used for testing. Two performance assessment experiments were conducted. The first investigated the maximum matching sensitivity as a function of the search area size, and the second assessed the change in CAD performance using these three search methods.Results: To include all 200 paired mass regions within the search areas, maximum widths were 28 and 68 mm for the use of the straight strip and the annular band search methods, respectively. When applying a single-image-based CAD scheme to this image database, 172 masses (86% sensitivity) and 523 false-positive (FP) regions (0.33 per image) were detected and cued. Among the positive findings, 92 were cued by the CAD system on both views, and 80 were cued on only one view. In an attempt to match as many of the 172 CAD-cued masses (true-positive [TP] regions) on two views by incrementally reducing the CAD threshold inside the different search areas, the CAD scheme generated 158 TP-TP paired matches with 14 TP-FP paired matches, 142 TP-TP paired matches with 30 TP-FP paired matches, and 146 TP-TP paired matches with 26 TP-FP paired matches, using the methods involving the straight strip, the annular band, and the mixed search areas, respectively. Using the straight strip search method, the CAD also eliminated 25% of FP regions initially cued by the single-image-based CAD scheme and generated the lowest case-based FP detection rate, namely, 15% less than that generated by the annular band method.Conclusions: This study showed that among these three search methods, the straight strip method required a smaller search area and achieved the highest level of CAD performance. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

3. Retrospective analyses of pivotal prospective studies with population segmentation: statistically based inferences and clinical relevance.

Author: Gur D and Gur, David
Abstract: Retrospective analyses of pivotal prospective studies are important for verifying the inferences made as a result of the original studies and for generating new hypotheses. However, careful attention should be given to the comprehensiveness and completeness of a retrospective analysis and how it is ultimately used. A recent retrospective analysis of the Digital Mammographic Imaging Screening Trial (DMIST) underscores several important points related to inference generation and generalization of the results on the basis of summary performance indexes, as well as the importance of incorporating a clinically relevant perspective when generating inferences primarily on the basis of statistical test results. This article highlights three important points related to (1) the use of performance indexes (namely, area under the receiver-operating characteristic curve), (2) applied statistical methods (namely, Bonferroni corrections for multiple comparison), and (3) practical conclusions (namely, consideration of all possible inferences that could be generated from the data), as well as possible implications and limitations of these retrospective analyses. The discussion in this paper is based on one specific retrospective analysis of a prospective study, but the topics addressed are quite basic, general, and potentially applicable to a number of retrospective analyses of data that are experimentally ascertained during pivotal prospective studies, as well as during observer performance studies. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

4. Performance assessments of diagnostic systems under the FROC paradigm: experimental, analytical, and results interpretation issues.

Author: Gur D and Rockette HE
Published: 2009
Full Text: View/download PDF

5. Contrast Enhanced Digital Mammography (CEDM) Helps to Safely Reduce Benign Breast Biopsies for Low to Moderately Suspicious Soft Tissue Lesions.

Author: Zuley ML, Bandos AI, Abrams GS, Ganott MA, Gizienski TA, Hakim CM, Kelly AE, Nair BE, Sumkin JH, Waheed U, and Gur D
Subjects: Biopsy, Breast diagnostic imaging, Female, Humans, Mammography, Middle Aged, North Carolina, Retrospective Studies, Breast Neoplasms diagnostic imaging
Abstract: Rationale and Objectives: To preliminarily asses if Contrast Enhanced Digital Mammography (CEDM) can accurately reduce biopsy rates for soft tissue BI-RADS 4A or 4B lesions., Materials and Methods: Eight radiologists retrospectively and independently reviewed 60 lesions in 54 consenting patients who underwent CEDM under Health Insurance Portability and Accountability Act compliant institutional review board-approved protocols. Readers provided Breast Imaging Reporting & Data System ratings sequentially for digital mammography/digital breast tomosynthesis (DM/DBT), then with ultrasound, then with CEDM for each lesion. Area under the curve (AUC), true positive rates and false positive rates, positive predictive values and negative predictive values were calculated. Statistical analysis accounting for correlation between lesion-examinations and between-reader variability was performed using OR/DBM (for SAS v.3.0), generalized linear mixed model for binary data (proc glimmix, SAS v.9.4, SAS Institute, Cary North Carolina), and bootstrap., Results: The cohort included 49 benign, two high-risk and nine cancerous lesions in 54 women aged 34-74 (average 50) years. Reader-averaged AUC for CEDM was significantly higher than DM/DBT alone (0.85 versus 0.66, p < 0.001) or with US (0.85 versus 0.75, p = 0.001). CEDM increased true positive rates from 0.74 under DB/DBT, and 0.89 with US, to 0.90 with CEDM, (p = 0.019 DM/DBT versus CEDM, p = 0.78 DM/DBT + US versus CEDM) and decreased false positive rates from 0.47 using DM/DBT and 0.61 with US to 0.39 with CEDM (p = 0.017 DM/DBT versus CEDM, p = 0.001 DM/DBT+ US versus CEDM). For an expected cancer rate of 10%, CEDM positive predictive values was 20.5% (95% CI: 16%-27%) and negative predictive values 98.3% (95% CI: 96%-100%)., Conclusion: Addition of CEDM for evaluation of low-moderate suspicion soft tissue breast lesions can substantially reduce biopsy of benign lesions without compromising cancer detection., (Copyright © 2019 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.)
Published: 2020
Full Text: View/download PDF

6. Estimating the Area Under ROC Curve When the Fitted Binormal Curves Demonstrate Improper Shape.

Author: Bandos AI, Guo B, and Gur D
Subjects: Bias, Humans, Likelihood Functions, Models, Statistical, Reproducibility of Results, Sensitivity and Specificity, Area Under Curve, ROC Curve, Radiology statistics & numerical data
Abstract: Rationale and Objectives: The "binormal" model is the most frequently used tool for parametric receiver operating characteristic (ROC) analysis. The binormal ROC curves can have "improper" (non-concave) shapes that are unrealistic in many practical applications, and several tools (eg, PROPROC) have been developed to address this problem. However, due to the general robustness of binormal ROCs, the improperness of the fitted curves might carry little consequence for inferences about global summary indices, such as the area under the ROC curve (AUC). In this work, we investigate the effect of severe improperness of fitted binormal ROC curves on the reliability of AUC estimates when the data arise from an actually proper curve., Materials and Methods: We designed theoretically proper ROC scenarios that induce severely improper shape of fitted binormal curves in the presence of well-distributed empirical ROC points. The binormal curves were fitted using maximum likelihood approach. Using simulations, we estimated the frequency of severely improper fitted curves, bias of the estimated AUC, and coverage of 95% confidence intervals (CIs). In Appendix S1, we provide additional information on percentiles of the distribution of AUC estimates and bias when estimating partial AUCs. We also compared the results to a reference standard provided by empirical estimates obtained from continuous data., Results: We observed up to 96% of severely improper curves depending on the scenario in question. The bias in the binormal AUC estimates was very small and the coverage of the CIs was close to nominal, whereas the estimates of partial AUC were biased upward in the high specificity range and downward in the low specificity range. Compared to a non-parametric approach, the binormal model led to slightly more variable AUC estimates, but at the same time to CIs with more appropriate coverage., Conclusions: The improper shape of the fitted binormal curve, by itself, ie, in the presence of a sufficient number of well-distributed points, does not imply unreliable AUC-based inferences., (Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.)
Published: 2017
Full Text: View/download PDF

7. Recall Rate Reduction with Tomosynthesis During Baseline Screening Examinations: An Assessment From a Prospective Trial.

Author: Sumkin JH, Ganott MA, Chough DM, Catullo VJ, Zuley ML, Shinde DD, Hakim CM, Bandos AI, and Gur D
Subjects: Adult, Carcinoma, Ductal, Breast diagnostic imaging, Carcinoma, Intraductal, Noninfiltrating diagnostic imaging, Early Detection of Cancer, Female, Humans, Middle Aged, Pennsylvania, Prospective Studies, Tomography, X-Ray Computed methods, Appointments and Schedules, Breast Neoplasms diagnostic imaging, Mammography methods, Radiographic Image Interpretation, Computer-Assisted
Abstract: Rationale and Objectives: Assess results of a prospective, single-site clinical study evaluating digital breast tomosynthesis (DBT) during baseline screening mammography., Materials and Methods: Under an institutional review board-approved Health Insurance Portability and Accountability Act (HIPAA)-compliant protocol, consenting women between ages 34 and 56 years scheduled for their initial and/or baseline screening mammogram underwent both full field digital mammography (FFDM) and DBT. The FFDM and the FFDM plus DBT images were interpreted independently in a reader by mode balanced approach by two of 14 participating radiologists. A woman was recalled for a diagnostic work-up if either radiologist recommended a recall. We report overall recall rates and related diagnostic outcome from the 1080 participants. Proportion of recommended recalls (Breast Imaging Reporting and Data System 0) were compared using a generalized linear mixed model (SAS 9.3) with a significance level of P = .0294., Results: The fraction of women without breast cancer recommended for recall using FFDM alone and FFDM plus DBT were 412 of 1074 (38.4%) and 274 of 1074 (25.5%), respectively (P < .001). Large inter-reader variability in terms of recall reduction was observed among the 14 readers; however, 11 of 14 readers recalled fewer women using FFDM plus DBT (5 with P < .015). Six cancers (four ductal carcinomas in situ [DCIS] and two invasive ductal carcinomas [IDC]) were detected. One IDC was detected only on DBT and one DCIS cancer was detected only on FFDM, whereas the remaining cancers were detected on both modalities., Conclusions: The use of FFDM plus DBT resulted in a significant decrease in recall rates during baseline screening mammography with no reduction in sensitivity., (Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2015
Full Text: View/download PDF

8. Impact of the new density reporting laws: radiologist perceptions and actual behavior.

Author: Gur D, Klym AH, King JL, Bandos AI, and Sumkin JH
Subjects: Aged, Breast Density, Female, Humans, Middle Aged, Pennsylvania, Practice Patterns, Physicians' legislation & jurisprudence, Radiographic Image Interpretation, Computer-Assisted, Attitude of Health Personnel, Breast Neoplasms diagnostic imaging, Mammary Glands, Human abnormalities, Mammography statistics & numerical data, Practice Patterns, Physicians' statistics & numerical data, Radiology legislation & jurisprudence
Abstract: Rationale and Objectives: To assess radiologists' perceptions of how the new Breast Density Notification Act (BDNA) of Pennsylvania would affect their breast density reporting and their actual reporting patterns after implementation., Materials and Methods: Under an institutional review board-approved protocol, we surveyed 21 radiologists about how they believe the new law affected their breast density reporting patterns and analyzed actual changes for 16 respondents before and after the law took effect. Three hundred consecutive reports were assessed for each radiologist before and after the effective date. The distributions of reported density Breast Imaging Reporting and Data System (BI-RADS) (1-4) were compared using a type III test in the context of an ordinal mixed model accounting for between-reader variability and adjusting for age (PROC GLIMMIX, SAS, version 9.3) using a two-sided .05 significance level., Results: Seventeen radiologists responded to the survey; however, one retired shortly after responding. Of the 16 respondents, 56% (nine of 16) did not favor the law, 13% (two of 16) were in favor, and 31% (five of 16) were neutral. The fraction perceived that after implementation, they rated more, equally, or less frequently breasts as scattered fibroglandular densities (BI-RADS 2) versus heterogeneously dense rating (BI-RADS 3) was 50% (eight of 16), 44% (seven of 16), and 6% (one of 16), respectively. In practice, 44% (seven of 16) performed differently than their survey answers. Fourteen of 16 radiologists increased the frequency of reported BI-RADS 2 scores after BDNA implementation with seven having statistically significant (P < .05) increases after adjusting for age differences., Conclusions: Radiologists' reporting patterns changed, at least for a short duration, after the new density reporting law and for some of the radiologists in an unexpected way., (Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2015
Full Text: View/download PDF

9. CADe for early detection of breast cancer-current status and why we need to continue to explore new approaches.

Author: Nishikawa RM and Gur D
Subjects: Artificial Intelligence, Female, Humans, Image Enhancement methods, Reproducibility of Results, Sensitivity and Specificity, Algorithms, Breast Neoplasms diagnosis, Early Detection of Cancer trends, Image Interpretation, Computer-Assisted methods, Mammography trends, Mass Screening trends, Pattern Recognition, Automated methods
Abstract: The authors describe where we are in terms of using computer-aided detection (CADe) systems during clinical mammographic interpretations, what are the issues that we face, and why they believe that, despite disappointment in terms of verified added value when it comes to detection of soft tissue abnormalities, we need to continue to explore new approaches to improving CADe-alone performance levels and, more important perhaps, new approaches to optimal communication of CADe-generated information., (Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2014
Full Text: View/download PDF

10. Impact of and interaction between the availability of prior examinations and DBT on the interpretation of negative and benign mammograms.

Author: Hakim CM, Anello MI, Cohen CS, Ganott MA, Lu AH, Perrin RL, Shah R, Lee Spangler M, Bandos AI, and Gur D
Subjects: Adult, Aged, Combined Modality Therapy methods, False Negative Reactions, Female, Humans, Middle Aged, Observer Variation, Reproducibility of Results, Sensitivity and Specificity, Breast Neoplasms diagnosis, Diagnostic Errors prevention & control, Mammography methods, Radiographic Image Enhancement methods, Tomography, X-Ray Computed methods
Abstract: Rationale and Objectives: To assess the interaction between the availability of prior examinations and digital breast tomosynthesis (DBT) in decisions to recall a woman during interpretation of mammograms., Materials and Methods: Eight radiologists independently interpreted twice 36 mammography examinations, each of which had current and prior full-field digital mammography images (FFDM) and DBT under a Health Insurance Portability and Accountability Act-compliant, institutional review board-approved protocol (written consent waived). During the first reading, three sequential ratings were provided using FFDM only, followed by FFDM + DBT, and then followed by FFDM + DBT + priors. The second reading included FFDM only, then FFDM + priors, and then FFDM + priors + DBT. Twenty-two benign cases clinically recalled, 12 negative/benign examinations (not recalled), and two verified cancer cases were included. Recall recommendations and interaction between the effect of priors and DBT on decisions were assessed (P = .05 significance level) using generalized linear model (PROC GLIMMIX, SAS, version 9.3; SAS Institute, Cary, NC) accounting for case and reader variability., Results: Average recall rates in noncancer cases were significantly reduced (51%; P < .001) with the addition of DBT and with addition of priors (23%; P = .01). In absolute terms, the addition of DBT to FFDM reduced the recall rates from 0.67 to 0.42 and from 0.54 to 0.27 when DBT was available before and after priors, respectively. Recall reductions were from 0.64 to 0.54 and from 0.42 to 0.33 when priors were available before and after DBT, respectively. Regardless of the sequence in presentation, there were no statistically significant interactions between the effect of availability of DBT and priors (P = .80)., Conclusions: Availability of both priors and DBT are independent primary factors in reducing recall recommendations during mammographic interpretations., (Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2014
Full Text: View/download PDF

11. Breast cancer screening in a multimodality environment--the need for a simple summary measure of marginal value.

Author: Gur D
Subjects: Female, Humans, Prognosis, Reproducibility of Results, Sensitivity and Specificity, Survival Rate, Breast Neoplasms diagnosis, Breast Neoplasms mortality, Early Detection of Cancer methods, Mammography methods, Multimodal Imaging methods, Outcome Assessment, Health Care methods
Abstract: In a rapidly changing clinical environment, assessment of imaging-based technologies and practices for periodic screening for the early detection of breast cancer is constrained by cost, complexity, and professional resources, particularly concerning supplementary imaging of subgroups constituting a large fraction of the screened population. Relatively high survival rates after detection make it extremely difficult to adequately assess marginal values of proposed approaches either before the technology in question being widely accepted and used or before it becomes largely obsolete. The author discusses several issues related to the assessment process and proposes the use of a surrogate summary measure of performance for this purpose, namely the number of recalled cases for the diagnostic workup of suspicious findings during repeat examinations, per one additional screen detected cancer that is invasive, node-negative, and classified grade 2 or above., (Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2014
Full Text: View/download PDF

12. Prediction of near-term breast cancer risk based on bilateral mammographic feature asymmetry.

Author: Tan M, Zheng B, Ramalingam P, and Gur D
Subjects: Feasibility Studies, Female, Follow-Up Studies, Humans, Mammography statistics & numerical data, Odds Ratio, ROC Curve, Retrospective Studies, Risk Assessment, Support Vector Machine, Breast Neoplasms diagnostic imaging, Mammography methods, Radiographic Image Interpretation, Computer-Assisted methods
Abstract: Rationale and Objectives: The objective of this study is to investigate the feasibility of predicting near-term risk of breast cancer development in women after a negative mammography screening examination. It is based on a statistical learning model that combines computerized image features related to bilateral mammographic tissue asymmetry and other clinical factors., Materials and Methods: A database of negative digital mammograms acquired from 994 women was retrospectively collected. In the next sequential screening examination (12 to 36 months later), 283 women were diagnosed positive for cancer, 349 were recalled for additional diagnostic workups and later proved to be benign, and 362 remain negative (not recalled). From an initial pool of 183 features, we applied a Sequential Forward Floating Selection feature selection method to search for effective features. Using 10 selected features, we developed and trained a support vector machine classification model to compute a cancer risk or probability score for each case. The area under the receiver operating characteristic curve and odds ratios (ORs) were used as the two performance assessment indices., Results: The area under the receiver operating characteristic curve = 0.725 ± 0.018 was obtained for positive and negative/benign case classification. The ORs showed an increasing risk trend with increasing model-generated risk scores (from 1.00 to 12.34, between positive and negative/benign case groups). Regression analysis of ORs also indicated a significant increase trend in slope (P = .006)., Conclusions: This study demonstrates that the risk scores computed by a new support vector machine model involving bilateral mammographic feature asymmetry have potential to assist the prediction of near-term risk of women for developing breast cancer., (Copyright © 2013 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2013
Full Text: View/download PDF

13. Three-dimensional airway tree architecture and pulmonary function.

Author: Pu J, Leader JK, Meng X, Whiting B, Wilson D, Sciurba FC, Reilly JJ, Bigbee WL, Siegfried J, and Gur D
Subjects: Female, Humans, Male, Middle Aged, Pennsylvania epidemiology, Prevalence, Radiographic Image Enhancement methods, Reproducibility of Results, Risk Assessment, Sensitivity and Specificity, Statistics as Topic, Imaging, Three-Dimensional methods, Lung diagnostic imaging, Pulmonary Disease, Chronic Obstructive diagnosis, Pulmonary Disease, Chronic Obstructive epidemiology, Radiographic Image Interpretation, Computer-Assisted methods, Respiratory Function Tests statistics & numerical data, Tomography, X-Ray Computed statistics & numerical data
Abstract: Rationale and Objectives: The airway tree is a primary conductive structure, and airways' morphologic characteristics, or variations thereof, may have an impact on airflow, thereby affecting pulmonary function. The objective of this study was to investigate the correlation between airway tree architecture, as depicted on computed tomography, and pulmonary function., Materials and Methods: A total of 548 chest computed tomographic examinations acquired on different patients at full inspiration were included in this study. The patients were enrolled in a study of chronic obstructive pulmonary disease (Specialized Center for Clinically Oriented Research) and underwent pulmonary function testing in addition to computed tomographic examinations. A fully automated airway tree segmentation algorithm was used to extract the three-dimensional airway tree from each examination. Using a skeletonization algorithm, airway tree volume-normalized architectural measures, including total airway length, branch count, and trachea length, were computed. Correlations between airway tree measurements with pulmonary function testing parameters and chronic obstructive pulmonary disease severity in terms of the Global Initiative for Obstructive Lung Disease classification were computed using Spearman's rank correlations., Results: Non-normalized total airway volume and trachea length were associated (P < .01) with lung capacity measures (ie, functional residual capacity, total lung capacity, inspiratory capacity, vital capacity, residual volume, and forced expiratory vital capacity). Spearman's correlation coefficients ranged from 0.27 to 0.55 (P < .01). With the exception of trachea length, all normalized architecture-based measures (ie, total airway volume, total airway length, and total branch count) had statistically significant associations with the lung function measures (forced expiratory volume in 1 second and the ratio of forced expiratory volume in 1 second to forced expiratory vital capacity), and adjusted volume was associated with all three respiratory impedance measures (lung reactance at 5 Hz, lung resistance at 5 Hz, and lung resistance at 20 Hz), and adjusted branch count was associated with all respiratory impedance measures but lung resistance at 20 Hz. When normalized for lung volume, all airway architectural measures were statistically significantly associated with chronic obstructive pulmonary disease severity, with Spearman's correlation coefficients ranging from -0.338 to -0.546 (P < .01)., Conclusions: Despite the large variability in anatomic characteristics of the airway tree across subjects, architecture-based measures demonstrated statistically significant associations (P < .01) with nearly all pulmonary function testing measures, as well as with disease severity., (Copyright © 2012 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2012
Full Text: View/download PDF

14. Evaluating imaging and computer-aided detection and diagnosis devices at the FDA.

Author: Gallas BD, Chan HP, D'Orsi CJ, Dodd LE, Giger ML, Gur D, Krupinski EA, Metz CE, Myers KJ, Obuchowski NA, Sahiner B, Toledano AY, and Zuley ML
Subjects: United States, United States Food and Drug Administration, Device Approval, Image Interpretation, Computer-Assisted instrumentation, Image Interpretation, Computer-Assisted standards, Technology Assessment, Biomedical standards, Technology Assessment, Biomedical trends
Abstract: This report summarizes the Joint FDA-MIPS Workshop on Methods for the Evaluation of Imaging and Computer-Assist Devices. The purpose of the workshop was to gather information on the current state of the science and facilitate consensus development on statistical methods and study designs for the evaluation of imaging devices to support US Food and Drug Administration submissions. Additionally, participants expected to identify gaps in knowledge and unmet needs that should be addressed in future research. This summary is intended to document the topics that were discussed at the meeting and disseminate the lessons that have been learned through past studies of imaging and computer-aided detection and diagnosis device performance., (Published by Elsevier Inc.)
Published: 2012
Full Text: View/download PDF

15. Dose reduction in digital breast tomosynthesis (DBT) screening using synthetically reconstructed projection images: an observer performance study.

Author: Gur D, Zuley ML, Anello MI, Rathfon GY, Chough DM, Ganott MA, Hakim CM, Wallace L, Lu A, and Bandos AI
Subjects: Adult, Aged, Female, Humans, Imaging, Three-Dimensional, Linear Models, Mammography, Middle Aged, Radiographic Image Enhancement methods, Retrospective Studies, Sensitivity and Specificity, Breast Neoplasms diagnostic imaging, Radiation Dosage, Radiographic Image Interpretation, Computer-Assisted methods, Tomography, X-Ray Computed methods
Abstract: Rationale and Objectives: The aim of this study was to retrospectively compare the interpretive performance of synthetically reconstructed two-dimensional images in combination with digital breast tomosynthesis (DBT) versus full-field digital mammography (FFDM) plus DBT., Materials and Methods: Ten radiologists trained in reading tomosynthesis examinations interpreted retrospectively, under two modes, 114 mammograms. One mode included the directly acquired full-field digital mammograms combined with DBT, and the other included synthetically reconstructed projection images combined with DBT. The reconstructed images do not require additional radiation exposure. The two modes were compared with respect to sensitivity, namely, recommendation to recall a breast with either a pathology-proven cancer (n = 48) or a high-risk lesion (n = 6), and specificity, namely, no recommendation to recall a breast not depicting an abnormality (n = 144) or depicting only benign abnormalities (n = 30)., Results: The average sensitivity for FFDM with DBT was 0.826, compared to 0.772 for synthetic FFDM with DBT (difference, 0.054; P = .017 and P = .053 for fixed and random reader effects, respectively). The proportions of breasts with no or benign abnormalities recommended to be recalled were virtually the same: 0.298 and 0.297 for the two modalities, respectively (95% confidence intervals for the difference, -0.028 to 0.036 and -0.070 to 0.066 for fixed and random reader effects, respectively). Sixteen additional clusters of microcalcifications ("positive" breasts) were missed by all readers combined when interpreting the mode with synthesized images versus FFDM., Conclusions: Lower sensitivity with comparable specificity was observed with the tested version of synthetically generated images compared to FFDM, both combined with DBT. Improved synthesized images with experimentally verified acceptable diagnostic quality will be needed to eliminate double exposure during DBT-based screening., (Copyright © 2012 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2012
Full Text: View/download PDF

16. Tomosynthesis-based imaging of the breast.

Author: Gur D
Subjects: Female, Humans, Breast Neoplasms diagnostic imaging, Radiographic Image Enhancement methods
Published: 2011
Full Text: View/download PDF

17. A preliminary evaluation of multi-probe resonance-frequency electrical impedance based measurements of the breast.

Author: Zheng B, Lederman D, Sumkin JH, Zuley ML, Gruss MZ, Lovy LS, and Gur D
Subjects: Adult, Biopsy, Needle, Dielectric Spectroscopy instrumentation, Female, Humans, Mammography, Middle Aged, Breast pathology, Breast Neoplasms diagnosis, Dielectric Spectroscopy methods
Abstract: Rationale and Objectives: The aim of this study was to preliminarily assess the performance of a new, resonance-frequency electrical impedance spectroscopy (REIS) system in identifying young women who were recommended to undergo breast biopsy following imaging., Materials and Methods: A seven-probe REIS system was designed and assembled and is currently being prospectively tested. During examination, contact is made with the nipple and six concentric points on the breast skin. Signal sweeps are performed, and outputs ranging from 200 to 800 kHz at 5-kHz intervals are recorded. An initial set of 140 patients, including 56 who eventually had biopsies, 63 who had negative results on screening mammography, and 21 recalled for additional imaging but later determined to have negative results, was used. An initial set of 35 features, 33 representing impedance signal differences between breasts and two representing participant age and average breast density, was assembled and reduced by a genetic algorithm to 14. The performance of an artificial neural network-based classifier was assessed using a case-based leave-one-out method., Results: The substantially greater asymmetry between signals of mirror-matched regions ascertained from biopsy ("positive") compared to nonbiopsy ("negative") cases resulted in an artificial neural network classifier performance (area under the curve) of 0.830 ± 0.023. At 90% specificity, this classifier, optimized for "recommendation for biopsy" rather than "cancer," detected 30 REIS-positive cases (54%), including six of nine (67%) actual cancer cases and six of nine women (67%) recommended for surgical excision of high-risk lesions., Conclusions: Asymmetry in impedance measurements between bilateral breasts may provide valuable discriminatory information regarding the presence of highly suspicious imaging-based findings., (Copyright Â© 2011 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2011
Full Text: View/download PDF

18. Is an ROC-type response truly always better than a binary response in observer performance studies?

Author: Gur D, Bandos AI, Rockette HE, Zuley ML, Hakim CM, Chough DM, Ganott MA, and Sumkin JH
Subjects: Female, Humans, Observer Variation, ROC Curve, Reproducibility of Results, Sensitivity and Specificity, Algorithms, Breast Neoplasms diagnostic imaging, Mammography methods, Radiographic Image Enhancement methods, Radiographic Image Interpretation, Computer-Assisted methods
Abstract: Rationale and Objectives: The aim of this study was to assess similarities and differences between methods of performance comparisons under binary (yes or no) and receiver-operating characteristic (ROC)-type pseudocontinuous (0-100) rating data ascertained during an observer performance study of interpretation of full-field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis., Materials and Methods: Rating data consisted of ROC-type pseudocontinuous and binary ratings generated by eight radiologists evaluating 77 digital mammographic examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination or, equivalently, the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between FFDM alone and FFDM plus digital breast tomosynthesis were compared in the context of fixed-reader and random-reader variability of the estimates., Results: The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs 0.07) and for the majority of individual readers (six of eight). Standardized differences were consistent with this finding (2.32 vs 1.63 on average). Reader-averaged differences in AUCs standardized by fixed-reader and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points., Conclusions: The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes, it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios in which analysis based on binary responses may provide statistical advantages., (Copyright 2010 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2010
Full Text: View/download PDF

19. Time to diagnosis and performance levels during repeat interpretations of digital breast tomosynthesis: preliminary observations.

Author: Zuley ML, Bandos AI, Abrams GS, Cohen C, Hakim CM, Sumkin JH, Drescher J, Rockette HE, and Gur D
Subjects: Female, Humans, Male, Observer Variation, Pennsylvania epidemiology, Reproducibility of Results, Sensitivity and Specificity, Time Factors, Breast Neoplasms diagnostic imaging, Breast Neoplasms epidemiology, Professional Competence statistics & numerical data, Tomography, X-Ray Computed methods, Workload statistics & numerical data
Abstract: Rationale and Objectives: To compare time to interpretation and diagnostic performance levels during repeat readings of full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) in a retrospective study., Materials and Methods: Three experienced radiologists twice interpreted 125 selected examinations, 35 with verified cancers and 90 negative for cancer during a period of 22 months using FFDM alone followed by a combined FFDM + DBT mode. Changes in time to "review and rate" these examinations as well as in diagnostic performance levels where assessed. A fixed-effect analysis accounting for cross-correlation due to the review of the same examinations by the same readers was performed., Results: The total (combined) time to review and rate an examination increased on average by 33% between the first and second readings of the same examinations (P < .001). Radiologists reduced their time to review FFDM before making the DBT available for viewing. However, they spent more time reviewing the combined FFDM + DBT mode. The recall rates for examinations depicting cancer remained largely unchanged. Among the groups of examinations with concordant and discordant recall recommendations during the two readings only the group examinations that were "newly recalled" during repeat reading, took significantly longer (P < .01)., Conclusion: DBT-based breast imaging may ultimately result in a substantial increase in performance; however, without efficiency improvements DBT may take longer to interpret. Addition of "false-positive recalls" was most strongly associated with increase in interpretation time while elimination of "false-positive recalls" did not require longer interpretation time., (Copyright 2010 AUR. Published by Elsevier Inc. All rights reserved.)
Published: 2010
Full Text: View/download PDF

20. The ductal carcinoma in situ (DCIS) dilemma.

Author: Gur D
Subjects: Breast Neoplasms diagnosis, Carcinoma, Intraductal, Noninfiltrating diagnosis, Clinical Trials as Topic trends, Female, Humans, Risk Assessment, United States, Breast Neoplasms mortality, Breast Neoplasms therapy, Carcinoma, Intraductal, Noninfiltrating mortality, Carcinoma, Intraductal, Noninfiltrating therapy, Women's Health
Published: 2010
Full Text: View/download PDF

21. Imaging technology and practice assessments: what next?

Author: Gur D
Subjects: United States, Biotechnology trends, Diagnostic Imaging trends, Practice Patterns, Physicians' trends, Randomized Controlled Trials as Topic trends, Technology Assessment, Biomedical trends
Published: 2009
Full Text: View/download PDF

22. Agreement of the order of overall performance levels under different reading paradigms.

Author: Gur D, Bandos AI, Klym AH, Cohen CS, Hakim CM, Hardesty LA, Ganott MA, Perrin RL, Poller WR, Shah R, Sumkin JH, Wallace LP, and Rockette HE
Subjects: Female, Humans, ROC Curve, Reproducibility of Results, Sensitivity and Specificity, Breast Neoplasms diagnostic imaging, Data Interpretation, Statistical, Image Interpretation, Computer-Assisted methods, Mammography methods, Observer Variation, Professional Competence, Task Performance and Analysis
Abstract: Rationale and Objectives: To investigate consistency of the orders of performance levels when interpreting mammograms under three different reading paradigms., Materials and Methods: We performed a retrospective observer study in which nine experienced radiologists rated an enriched set of mammography examinations that they personally had read in the clinic ("individualized") mixed with a set that none of them had read in the clinic ("common set"). Examinations were interpreted under three different reading paradigms: binary using screening Breast Imaging Reporting and Data System (BI-RADS), receiver-operating characteristic (ROC), and free-response ROC (FROC). The performance in discriminating between cancer and noncancer findings under each of the paradigms was summarized using Youden's index/2+0.5 (Binary), nonparameteric area under the ROC curve (AUC), and an overall FROC index (JAFROC-2). Pearson correlation coefficients were then computed to assess consistency in the ordering of observers' performance levels. Statistical significance of the computed correlation coefficients was assessed using bootstrap confidence intervals obtained by resampling sets of examination-specific observations., Results: All but one of the computed pair-wise correlation coefficients were larger than 0.66 and were significantly different from zero. The correlation between the overall performance measures under the Binary and ROC paradigms was the lowest (0.43) and was not significantly different from zero (95% confidence interval -0.078 to 0.733)., Conclusion: The use of different evaluation paradigms in the laboratory tends to lead to consistent ordering of the overall performance levels of observers. However, one should recognize that conceptually similar performance indexes resulting from different paradigms often measure different performance characteristics and thus disagreements are not only possible but frequently quite natural.
Published: 2008
Full Text: View/download PDF

23. Performance assessments of diagnostic systems under the FROC paradigm: experimental, analytical, and results interpretation issues.

Author: Gur D and Rockette HE
Subjects: Algorithms, Image Enhancement methods, Image Interpretation, Computer-Assisted methods, ROC Curve
Abstract: As use of free response receiver-operating characteristic (FROC) curves gains more acceptance for quantitatively assessing the performance of diagnostic systems, it is important that the experimentalist understands the possible role of this approach as one of the experimental design paradigms that are available to him or her among all other approaches as well as some of the issues associated with FROC type studies. In a number of experimental scenarios, the FROC paradigm and associated analytical tools have theoretical and practical advantages over both the binary and the ROC approaches to performance assessments of diagnostic systems, but it also has some limitations related to experimental design, data analyses, clinical relevance, and complexity in the interpretation of the results. These issues are rarely discussed and are the focus of this work.
Published: 2008
Full Text: View/download PDF

24. Selection of a rating scale in receiver operating characteristic studies: some remaining issues.

Author: Rockette HE and Gur D
Subjects: Abdomen, Humans, Diagnostic Imaging statistics & numerical data, Observer Variation, ROC Curve
Abstract: Rationale and Objectives: The aim of this study is to compare the ratings of a group of readers that used two different rating scales in a receiver operating characteristic (ROC) study and to clarify some remaining issues when selecting a rating scale for such studies., Materials and Methods: We reanalyzed a previously conducted ROC study in which readers used both a 5-point and a 101-point scale to identify abdominal masses in 95 cases. Summary statistics include the distribution of scores by reader for each of the rating scales, the proportion of tied scores when using the 5-point scale that correctly resolved when using the 101-point scale and the proportion of paired normal-abnormal cases where the two rating scales resulted in a different selection of an abnormal case., Results: As a group, the readers used 84 of the rating categories when using the 101-point scale but the categories used differed for individual readers. All readers tended to resolve the majority of ties on the 5-point scale in favor of correct decisions and to maintain correct decisions when a more refined scale was used., Conclusions: The reanalysis presented here provides additional evidence that readers in a ROC study can adjust to a 101-point scale and the use of such a refined scale can increase discriminative ability. However, the decision of selecting an appropriate scale should also consider the underlying abnormality in question and relevant clinical considerations.
Published: 2008
Full Text: View/download PDF

25. Interactive computer-aided diagnosis of breast masses: computerized selection of visually similar image sets from a reference library.

Author: Zheng B, Mello-Thoms C, Wang XH, Abrams GS, Sumkin JH, Chough DM, Ganott MA, Lu A, and Gur D
Subjects: Female, Humans, Observer Variation, Breast Neoplasms diagnostic imaging, Mammography, Radiographic Image Interpretation, Computer-Assisted
Abstract: Rationale and Objectives: The clinical utility of interactive computer-aided diagnosis (ICAD) systems depends on clinical relevance and visual similarity between the queried breast lesions and the ICAD-selected reference regions. The objective of this study is to develop and test a new ICAD scheme that aims improve visual similarity of ICAD-selected reference regions., Materials and Methods: A large and diverse reference library involving 3,000 regions of interests was established. For each queried breast mass lesion by the observer, the ICAD scheme segments the lesion, classifies its boundary spiculation level, and computes 14 image features representing the segmented lesion and its surrounding tissue background. A conditioned k-nearest neighbor algorithm is applied to select a set of the 25 most "similar" lesions from the reference library. After computing the mutual information between the queried lesion and each of these initially selected 25 lesions, the scheme displays the six reference lesions with the highest mutual information scores. To evaluate the automated selection process of the six "visually similar" lesions to the queried lesion, we conducted a two-alternative forced-choice observer preference study using 85 queried mass lesions. Two sets of reference lesions selected by one new automated ICAD scheme and the other previously reported scheme using a subjective rating method were randomly displayed on the left and right side of the queried lesion. Nine observers were asked to decide for each of the 85 queried lesions which one of the two reference sets was "more visually similar" to the queried lesion., Results: In classification of mass boundary spiculation levels, the overall agreement rate between the automated scheme and an observer is 58.8% (Kappa = 0.31). In observer preference study, the nine observers preferred on average the reference lesion sets selected by the automated scheme as being more visually similar than the set selected by the subjective rating approach in 53.2% of the queried lesions. The results were not significantly different for the two methods (P = .128)., Conclusions: This study suggests that using the new automated ICAD scheme, the interobserver variability related issues can thus be avoided. Furthermore, the new scheme maintains the similar performance level as the previous scheme using the subjective rating method that can select reference sets that are significantly more visually similar (P < .05) than when using traditional ICAD schemes in which the mass boundary spiculation levels are not accurately detected and quantified.
Published: 2007
Full Text: View/download PDF

26. "Binary" and "non-binary" detection tasks: are current performance measures optimal?

Author: Gur D, Rockette HE, and Bandos AI
Subjects: Area Under Curve, Diagnosis, Computer-Assisted, Humans, ROC Curve, Reproducibility of Results, Clinical Competence, Diagnostic Imaging, Lung Diseases diagnosis, Monte Carlo Method, Observer Variation
Abstract: We have observed that a very large fraction of responses for several detection tasks during the performance of observer studies are in the extreme ranges of lower than 11% or higher than 89% regardless of the actual presence or absence of the abnormality in question or its subjectively rated "subtleness." This observation raises questions regarding the validity and appropriateness of using multicategory rating scales for such detection tasks. Monte Carlo simulation of binary and multicategory ratings for these tasks demonstrate that the use of the former (binary) often results in a less biased and more precise summary index and hence may lead to a higher statistical power for determining differences between modalities.
Published: 2007
Full Text: View/download PDF

27. Objectively measuring and comparing performance levels of diagnostic imaging systems and practices.

Author: Gur D
Subjects: Diagnosis, Computer-Assisted instrumentation, Equipment Design, Humans, Diagnostic Imaging instrumentation, Diagnostic Imaging standards, ROC Curve, Technology Assessment, Biomedical methods
Published: 2007
Full Text: View/download PDF

28. The prevalence effect in a laboratory environment: Changing the confidence ratings.

Author: Gur D, Bandos AI, Fuhrman CR, Klym AH, King JL, and Rockette HE
Subjects: Humans, Laboratories, Prevalence, Obsessive Behavior, ROC Curve, Radiography
Abstract: Rationale and Objectives: We sought to assess whether or not prevalence levels affected the confidence ratings of readers during the interpretation of cases in a laboratory receiver operating characteristic-type observer performance study., Materials and Methods: We reanalyzed a previously conducted observer performance study that included 14 readers and 5 different levels of prevalence. The previous study yielded the observation that in the laboratory we could not detect a "prevalence effect" in terms of differences in areas under the receiver operating characteristic curves. The detection ratings (for presence or absence) of lung nodules, interstitial disease, and pneumothorax for the five prevalence levels were compared, and a test for trend in averaged ratings as a function of abnormality prevalence was performed within a mixed-model setting that accounts for different sources of variability and correlations induced by the study design., Results: The ratings of the cases in terms of confidence that the specific abnormality in question is present tend, on average, to be larger when actual disease prevalence is lower. The rate of the increase of the average confidence ratings with the decreasing prevalence of a specific abnormality is very similar for actually positive and actually negative cases for every considered abnormality. The observed trend in the changes of the average confidence ratings as a function of prevalence levels was statistically significant (p < 0.01)., Conclusion: Expectations of disease prevalence in the case mix during a laboratory observer performance study may systematically affect the behavior of observers in terms of their actual confidence ratings.
Published: 2007
Full Text: View/download PDF

29. Computerized estimation of the lung volume removed during lung volume reduction surgery.

Author: Gilbert S, Zheng B, Leader JK, Luketich JD, Fuhrman CR, Landreneau RJ, Gur D, and Sciurba FC
Subjects: Absorptiometry, Photon, Aged, Artificial Intelligence, Carbon Monoxide analysis, Female, Forced Expiratory Volume, Humans, Image Processing, Computer-Assisted, Linear Models, Male, Middle Aged, Organ Size, Pulmonary Emphysema diagnostic imaging, Pulmonary Emphysema pathology, Research Design, Total Lung Capacity, Treatment Outcome, Pneumonectomy, Pulmonary Emphysema physiopathology, Pulmonary Emphysema surgery, Tomography, Spiral Computed
Abstract: Rationale and Objectives: This study was designed to develop an automated method for estimating lung volume removed during lung volume reduction surgery (LVRS) using computed tomography (CT)., Materials and Methods: The CT examinations of six patients who underwent bilateral LVRS were analyzed in this study. The resected lung tissue (right and left) was weighed during pathologic examination. An automated computer scheme was developed to estimate the lung volume removed using the CT voxel values and lung specimen weight. The computed fraction of lung volume removed was evaluated across a range of simulated surgical planes (ie, other than parallel to the CT image plane) and CT reconstruction kernels, and it was compared with the surgeons' postsurgical estimates., Results: The computed fraction of the lung volume removed during LVRS was linearly correlated with the resected lung tissue weight (Pearson correlation = 0.697, P = .012). The computed fraction of lung volume removed ranged from 12.9% to 51.7% of the total lung volume. The surgeons' postsurgical estimates of lung volume removed ranged from 30% to 33%. The percent difference between the surgeons' estimates and the computed lung volume removed as a percentage of the surgeons' estimates ranged from -72.3% to 57.0% with mean absolute difference of 29.7% (+/-20.7)., Conclusion: The preliminary findings of this study suggest that the proposed quantitative model should provide an objective measure of lung volume removed during LVRS that may be used to investigate the relationship between lung volume removed and outcome.
Published: 2006
Full Text: View/download PDF

30. Evaluation of lung MDCT nodule annotation across radiologists and methods.

Author: Meyer CR, Johnson TD, McLennan G, Aberle DR, Kazerooni EA, Macmahon H, Mullan BF, Yankelevitz DF, van Beek EJ, Armato SG 3rd, McNitt-Gray MF, Reeves AP, Gur D, Henschke CI, Hoffman EA, Bland PH, Laderach G, Pais R, Qing D, Piker C, Guo J, Starkey A, Max D, Croft BY, and Clarke LP
Subjects: Humans, Lung Neoplasms diagnostic imaging, Radiology, Reproducibility of Results, Sensitivity and Specificity, Artificial Intelligence, Image Interpretation, Computer-Assisted methods, Observer Variation, Pattern Recognition, Automated methods, Physicians statistics & numerical data, Professional Competence, Solitary Pulmonary Nodule diagnostic imaging, Task Performance and Analysis, Tomography, X-Ray Computed statistics & numerical data
Abstract: Rationale and Objectives: Integral to the mission of the National Institutes of Health-sponsored Lung Imaging Database Consortium is the accurate definition of the spatial location of pulmonary nodules. Because the majority of small lung nodules are not resected, a reference standard from histopathology is generally unavailable. Thus assessing the source of variability in defining the spatial location of lung nodules by expert radiologists using different software tools as an alternative form of truth is necessary., Materials and Methods: The relative differences in performance of six radiologists each applying three annotation methods to the task of defining the spatial extent of 23 different lung nodules were evaluated. The variability of radiologists' spatial definitions for a nodule was measured using both volumes and probability maps (p-map). Results were analyzed using a linear mixed-effects model that included nested random effects., Results: Across the combination of all nodules, volume and p-map model parameters were found to be significant at P < .05 for all methods, all radiologists, and all second-order interactions except one. The radiologist and methods variables accounted for 15% and 3.5% of the total p-map variance, respectively, and 40.4% and 31.1% of the total volume variance, respectively., Conclusion: Radiologists represent the major source of variance as compared with drawing tools independent of drawing metric used. Although the random noise component is larger for the p-map analysis than for volume estimation, the p-map analysis appears to have more power to detect differences in radiologist-method combinations. The standard deviation of the volume measurement task appears to be proportional to nodule volume.
Published: 2006
Full Text: View/download PDF

31. Reader variance in ROC studies--generalizability to reader population at high and low performance levels.

Author: Gur D, Bandos AI, Klym AH, and Rockette HE
Subjects: Humans, Radiography, Clinical Competence, Lung Diseases, Interstitial diagnostic imaging, Observer Variation, Pneumothorax diagnostic imaging, ROC Curve
Abstract: Rationale and Objectives: To investigate the variability between discriminative performances of readers as a function of average performance levels during receiver operating characteristic (ROC) studies., Materials and Methods: Four subsets of cases from previously ascertained ROC rating data by 12 observers when detecting interstitial disease and pneumothorax on posteroanterior chest films were selected for each abnormality and reanalyzed to assess changes in "reader" variance component. The subsets were selected based on a prestudy subjective assessment of the subtleness of depicted abnormality (positive cases) and the difficulty in determining its absence (negative cases). Reader variance component was estimated using a bootstrap approach for each subset and the results were used to assess a general relationship between variability and average performance level., Results: The reader variance component decreased substantially (from 0.007704 to 0.000426), as expected, when the areas under the ROC curves (AUC) for detecting pneumothoraces increased from 84% to 97%. On the other hand, reader variance component increased substantially (from 0.000890 to 0.005181) when AUC for detecting interstitial disease increased from 59% to 87%. The large magnitude of and changes in the reader variance component resulted in a consistent nonmonotone relationship as a function of AUC when other related variance components were included in addition to the reader component., Conclusion: Among several factors affecting generalizability of ROC results to the population of readers, the reader variance component depended nonmonotonically on the average diagnostic performance and is lowest at both very high and very low levels of performance.
Published: 2006
Full Text: View/download PDF

32. A permutation test for comparing ROC curves in multireader studies a multi-reader ROC, permutation test.

Author: Bandos AI, Rockette HE, and Gur D
Subjects: Analysis of Variance, Computer Simulation, Observer Variation, Reproducibility of Results, Sensitivity and Specificity, Data Interpretation, Statistical, Image Enhancement methods, Image Interpretation, Computer-Assisted methods, Models, Statistical, ROC Curve
Abstract: Rationale and Objectives: The aim of the study is to develop a permutation test to compare receiver operating characteristic (ROC) curves of two diagnostic modalities in a multireader paired design., Materials and Methods: A statistical test for comparing two diagnostic modalities is developed based on all possible exchanges of the set of reader-ratings between the two modalities. An exact permutation test is formed by determining the frequency of the most extreme values of the statistic estimating the average difference in the areas under the ROC curves (AUCs). An asymptotic version of the test is constructed by obtaining the exact permutation variance and appealing to the asymptotic normality of the nonparametric estimator of the average difference in areas. Computer simulations were conducted to validate the type I error for small sample sizes., Results: The new test provides a permutation approach for comparing ROC curves in a multireader paired-design setting in which effects of the readers are considered to be fixed. The type I error of the asymptotic test is close to the true value, even for samples as small as 20 normal and 20 abnormal cases. The test is designed to be sensitive to alternatives in which the AUCs of the two diagnostic modalities differ., Conclusions: The proposed test provides a powerful method for comparing two diagnostic modalities in a multireader paired-study design when the primary interest is to detect difference in average AUCs.
Published: 2006
Full Text: View/download PDF

33. The effect of image display size on observer performance an assessment of variance components.

Author: Gur D, Klym AH, King JL, Maitz GS, Mello-Thoms C, Rockette HE, and Thaete FL
Subjects: Abdominal Neoplasms epidemiology, Algorithms, Analysis of Variance, Data Display, Humans, Observer Variation, Radiography, Thoracic statistics & numerical data, Reproducibility of Results, Sensitivity and Specificity, Abdominal Neoplasms diagnostic imaging, Diagnostic Imaging methods, Image Enhancement methods, Image Interpretation, Computer-Assisted methods, Task Performance and Analysis, User-Computer Interface, Visual Perception
Abstract: Rationale and Objective: Our goal was to investigate the effect of the displayed image size on variance components during the performance of an observer performance study to detect masses on abdominal computed tomography (CT) examinations., Materials and Methods: A previously performed receiver operating characteristic (ROC) study with eight observers to detect abdominal masses on 166 CT examinations was reanalyzed to assess variance components when comparing two similar modes with displayed image sizes varying by a factor of 2. Case, mode, and reader-related variance components were estimated for the group of eight observers and subsets of readers after excluding each of the participants., Results: There was no significant difference in the average area under the ROC curves between the two modes using the two image sizes (P > .05). Reader and reader-by-case variability were substantially larger for the mode displaying enlarged images for the group and all subsets formed by excluding a single reader. Reader variability was affected by one observer who actually performed better with the enlarged images., Conclusion: Sequential viewing of enlarged CT images for the detection of abdominal masses did not improve performance and increased reader variability.
Published: 2006
Full Text: View/download PDF

34. Head-mounted versus remote eye tracking of radiologists searching for breast cancer: a comparison.

Author: Mello-Thoms C, Britton C, Abrams G, Hakim C, Shah R, Hardesty L, Maitz G, and Gur D
Subjects: Decision Making, False Negative Reactions, False Positive Reactions, Female, Humans, Memory, Observer Variation, Task Performance and Analysis, Breast Neoplasms diagnostic imaging, Breast Neoplasms pathology, Fixation, Ocular, Head, Mammography, Visual Perception
Abstract: Purpose: We compared performance and visual search parameters of radiologists detecting masses on mammograms by using both a head-mounted (HDMT) and a remote (REM) eye tracker., Materials and Methods: Five experienced radiologists read twice a case set of 20 one-view (medial-lateral oblique) mammograms, of which 12 contained a malignant mass and eight were lesion-free. For each observer, one trial used an HDMT eye-tracking system and the other used an REM system. Trials were separated on average by 2 months. Time to hit the location of the mass, dwell, and number of fixations in the location of the mass were measured. The same parameters were measured on a per-trial basis to determine whether there were memory effects from the previous trial., Results: Dwell times in the location of true-positive, false-positive, and false-negative results were significantly shorter (P < .05) using the HDMT (median, 0.395 seconds) than REM (median, 0.482 seconds) systems, but the number of fixations in the location of the response was smaller using the REM system (median, 4.33 versus 5.0 for the HDMT). The observed differences did not seem to be caused by a memory effect. In addition, the relative lack of head mobility using the REM system caused observers to report neck strain., Conclusion: Overall, radiologists' visual search behavior was very similar using both types of eye-tracking device. However, because the REM system did not contain a magnetic head tracker, radiologists were allowed very limited head movements when using it, which made them uncomfortable during the experiment.
Published: 2006
Full Text: View/download PDF

35. Variability in observer performance studies experimental observations.

Author: Gur D, Rockette HE, Maitz GS, King JL, Klym AH, and Bandos AI
Subjects: Humans, Radiography, Reproducibility of Results, Retrospective Studies, Sensitivity and Specificity, United States, Image Interpretation, Computer-Assisted methods, Lung Diseases diagnostic imaging, Observer Variation, Quality Assurance, Health Care methods, ROC Curve, Task Performance and Analysis
Abstract: Rationale and Objectives: The aim of the study is to assess variance components in observer performance studies and the possible impact on study results and conclusions., Materials and Methods: Two previously performed retrospective receiver operating characteristic-type observer performance studies to evaluate the performance of seven radiologists in detecting interstitial disease on conventional posteroanterior chest films and nine radiologists in detecting interstitial disease on a high-resolution workstation were reanalyzed by using the Beiden, Wagner, and Campbell nine-component model to estimate the different variance components. We estimated case-, reader-, and mode-related components of the variance for the group as a whole and after excluding (round robin) each reader. Overall variance was evaluated, and the effect of individual readers on overall study conclusions was assessed., Results: Overall results and conclusions of the reanalysis agreed with the original one in that, as a group, radiologists performed significantly better when using conventional films (P < .05) in both studies. Reader variability was large compared with all other components, and in one study, it was substantially larger for the workstation reading mode. Reader variability was affected substantially by one observer in each study, and in one study, reader-by-mode variability was affected by another reader who performed better on the workstation., Conclusion: Estimates of variance components can shed light on the appropriateness of study design, as well as the sensitivity of results to the inclusion (or exclusion) of individual observers.
Published: 2005
Full Text: View/download PDF

36. Incorporating utility-weights when comparing two diagnostic systems: a preliminary assessment.

Author: Bandos AI, Rockette HE, and Gur D
Subjects: Feasibility Studies, Pilot Projects, Reproducibility of Results, Sensitivity and Specificity, Algorithms, Data Interpretation, Statistical, Diagnostic Imaging methods, Image Interpretation, Computer-Assisted methods, ROC Curve
Abstract: Rationale and Objectives: We sought to develop a new index that incorporates utility-weights when assessing the overall performance of a diagnostic system and to provide a statistical test for comparing two indices in a paired study design., Materials and Methods: The area under the receiver operating characteristic (ROC) curve (AUC) was used as the basis for constructing a new index. The index we propose represents a weighted average of class-specific AUCs each of which relates to a class of pairs of actually negative (normal) and actually positive (abnormal) cases with a specific predetermined utility (or clinical importance). For each pair of normal-abnormal cases, the utility is defined a priori and based on external (covariate) information. In the proposed approach utility-weights represent the relative importance (utility) of discriminating between different types of normal and abnormal cases (pairs of the same type are combined in the classes termed utility-classes). We also describe a simple nonparametric procedure for comparing the proposed indices as computed from paired data. Computer simulations were conducted to evaluate the behavior of the type I error of the proposed test in the simple albeit important instance of two utility-classes., Results: The new index provides an extension of the commonly used area under the ROC curve. It allows for incorporation of utility-weights into the analysis and reduces to the conventional AUC index when all assigned utility-weights are equal to unity. Computer simulations indicate that in the considered scenario of two utility-classes, the type I error of the proposed test is comparable to that of the conventional nonparametric test for equality of AUC indices., Conclusions: The proposed index and the statistical test provide a practical approach of incorporating utilities when comparing diagnostic systems.
Published: 2005
Full Text: View/download PDF

37. A conditional nonparametric test for comparing two areas under the ROC curves from a paired design.

Author: Bandos AI, Rockette HE, and Gur D
Subjects: Computer Simulation, Humans, Matched-Pair Analysis, Models, Statistical, Normal Distribution, Sample Size, Area Under Curve, ROC Curve, Statistics, Nonparametric, Technology, Radiologic statistics & numerical data
Abstract: Rationale and Objectives: To develop a conditional nonparametric procedure for comparing two correlated areas under receiver operating characteristic (ROC) curves (AUC)., Materials and Methods: A nonparametric conditional test to compare areas under two ROC curves was developed using the distribution of the elements of the nonparametric AUC estimators in a permutation space. The conditioning is made on the observed discordances between the relative orderings of ratings of the normal and abnormal cases for the two modalities taken over all possible pairs. The type I error of the procedure was verified using computer simulations. The power of the test was compared with an existing unconditional procedure on simulated datasets from binormal distributions as well as from a mixture of binormal distributions of ratings., Results: The proposed test is conservative for low sample sizes, large AUC, and high correlation between modalities. It possesses a reasonable type I error for sample sizes as low as 20 actually positive and 20 actually negative cases. In plausible situations in which the sample in observer performance studies can not be monotonically transformed into a binormal distribution, this approach may have modest power advantages over the conventional nonparametric test., Conclusion: The conditional nonparametric test presented here is an alternative approach to existing unconditional procedures and may offer advantages in certain types of observer performance studies.
Published: 2005
Full Text: View/download PDF

38. "Memory effect" in observer performance studies of mammograms.

Author: Hardesty LA, Ganott MA, Hakim CM, Cohen CS, Clearfield RJ, and Gur D
Subjects: Diagnostic Errors, Employee Performance Appraisal, False Negative Reactions, False Positive Reactions, Female, Humans, Male, Observer Variation, Research Design, Retrospective Studies, Breast Neoplasms diagnostic imaging, Mammography standards, Memory, Radiology standards
Abstract: Rationale and Objective: To evaluate breast radiologists' recognition of mammograms showing cancers that they correctly detected or "missed" during clinical interpretations., Materials and Methods: Two similar experiments were conducted. In the first, 33 bilateral screening mammograms were reviewed by four breast imagers. These included five cancers that each radiologist had detected, two cancers that each radiologist had "missed," and five mammograms recalled by other radiologists that were not cancer. Radiologists were asked if they had interpreted the mammogram in clinic and if the mammogram was suspicious for cancer. In the second experiment, four different breast imagers reviewed 48 mammograms that included five cancers that each radiologist had detected, two cancers that each radiologist had "missed," and five mammograms that were recalled by each radiologist but were not cancer. Using chi-square analysis, the performance of the radiologists on screening mammograms they had read in clinic was compared with their performance on mammograms read in clinic by other radiologists., Results: Seven of eight radiologists did not remember interpreting any of the mammograms in clinic. One radiologist correctly remembered interpreting one mammogram in clinic, but interpreted it incorrectly. Average performance showed no significant difference (P = .60) between mammograms they had interpreted in clinic and those interpreted by others., Conclusion: Radiologists do not remember most mammograms showing cancer that they have interpreted, either correctly or incorrectly, after they are mixed with mammograms showing cancer that were interpreted by other radiologists. Screening mammograms can be used in observer performance studies in which the interpreting radiologist participates as an observer.
Published: 2005
Full Text: View/download PDF

39. Assessment methodologies and statistical issues for computer-aided diagnosis of lung nodules in computed tomography: contemporary research topics relevant to the lung image database consortium.

Author: Dodd LE, Wagner RF, Armato SG 3rd, McNitt-Gray MF, Beiden S, Chan HP, Gur D, McLennan G, Metz CE, Petrick N, Sahiner B, and Sayre J
Subjects: Algorithms, Humans, Radiology Information Systems, Statistics as Topic, United States, Diagnosis, Computer-Assisted, Lung Neoplasms diagnostic imaging, Tomography, X-Ray Computed
Abstract: Cancer of the lung and bronchus is the leading fatal malignancy in the United States. Five-year survival is low, but treatment of early stage disease considerably improves chances of survival. Advances in multidetector-row computed tomography technology provide detection of smaller lung nodules and offer a potentially effective screening tool. The large number of images per exam, however, requires considerable radiologist time for interpretation and is an impediment to clinical throughput. Thus, computer-aided diagnosis (CAD) methods are needed to assist radiologists with their decision making. To promote the development of CAD methods, the National Cancer Institute formed the Lung Image Database Consortium (LIDC). The LIDC is charged with developing the consensus and standards necessary to create an image database of multidetector-row computed tomography lung images as a resource for CAD researchers. To develop such a prospective database, its potential uses must be anticipated. The ultimate applications will influence the information that must be included along with the images, the relevant measures of algorithm performance, and the number of required images. In this article we outline assessment methodologies and statistical issues as they relate to several potential uses of the LIDC database. We review methods for performance assessment and discuss issues of defining "truth" as well as the complications that arise when truth information is not available. We also discuss issues about sizing and populating a database.
Published: 2004
Full Text: View/download PDF

40. Detection and classification performance levels of mammographic masses under different computer-aided detection cueing environments.

Author: Zheng B, Swensson RG, Golla S, Hakim CM, Shah R, Wallace L, and Gur D
Subjects: Area Under Curve, Cues, False Positive Reactions, Female, Humans, Observer Variation, ROC Curve, Breast Diseases diagnostic imaging, Breast Diseases pathology, Mammography, Radiographic Image Interpretation, Computer-Assisted
Abstract: Rationale and Objectives: The authors evaluated the impact of different computer-aided detection (CAD) cueing conditions on radiologists' performance levels in detecting and classifying masses depicted on mammograms., Materials and Methods: In an observer performance study, eight radiologists interpreted 110 subtle cases six times under different display conditions to detect depicted masses and classify them as benign or malignant. Forty-five cases depicted biopsy-proven masses and 65 were negative. One mass-based cueing sensitivity of 80% and two false-positive cueing rates of 1.2 and 0.5 per image were used in this study. In one mode, radiologists first interpreted images without CAD results, followed by the display of cues and reinterpretation. In another mode, radiologists viewed CAD cues as images were presented and then interpreted images. Free-response receiver operating characteristic method was used to analyze and compare detection performance. The receiver operating characteristic method was used to evaluate classification performance., Results: At these performance levels, providing cues after initial interpretation had little effect on the overall performance in detecting masses. However, in the mode with the highest false-positive cueing rate, viewing CAD cues immediately upon display of images significantly reduced average performance for both detection and classification tasks (P < .05). Viewing CAD cues during the initial display consistently resulted in fewer abnormalities being identified in noncued regions., Conclusion: CAD systems with low sensitivity (< or = 80% on mass-based detection) and high false-positive rate (> or = 0.5 per image) in a dataset with subtle abnormalities had little effect on radiologists' performance in the detection and classification of mammographic masses.
Published: 2004
Full Text: View/download PDF

41. On the repeated use of databases for testing incremental improvement of computer-aided detection schemes.

Author: Gur D, Wagner RF, and Chan HP
Subjects: Humans, Learning, Professional Practice, Radiographic Image Interpretation, Computer-Assisted, Technology, Radiologic education, Databases, Factual statistics & numerical data, Diagnosis, Computer-Assisted methods, Research Design
Published: 2004
Full Text: View/download PDF

42. From the laboratory to the clinic: the "prevalence effect".

Author: Gur D, Rockette HE, Warfel T, Lacomis JM, and Fuhrman CR
Subjects: ROC Curve, Research Design, Observer Variation, Radiology
Published: 2003
Full Text: View/download PDF

43. Automated lung segmentation in X-ray computed tomography: development and evaluation of a heuristic threshold-based scheme.

Author: Leader JK, Zheng B, Rogers RM, Sciurba FC, Perez A, Chapman BE, Patel S, Fuhrman CR, and Gur D
Subjects: Image Processing, Computer-Assisted, Lung Diseases diagnostic imaging, Pulmonary Emphysema diagnostic imaging, Software Design, Lung diagnostic imaging, Tomography, X-Ray Computed methods
Abstract: Rationale and Objectives: To develop and evaluate a reliable, fully-automated lung segmentation scheme for application in X-ray computed tomography., Materials and Methods: The automated scheme was heuristically developed using a slice-based, pixel-value threshold and two sets of classification rules. Features used in the rules include size, circularity, and location. The segmentation scheme operates slice-by-slice and performs three key operations: (1) image preprocessing to remove background pixels, (2) computation and application of a pixel-value threshold to identify lung tissue, and (3) refinement of the initial segmented regions to prune incorrectly detected airways and separate fused right and left lungs., Results: The performance of the automated segmentation scheme was evaluated using 101 computed tomography cases (91 thick slice, 10 thin slice scans). The 91 thick cases were pre- and post-surgery from 50 patients and were not independent. The automated scheme successfully segmented 94.0% of the 2,969 thick slice images and 97.6% of the 1,161 thin slice images. The mean difference of the total lung volumes calculated by the automated scheme and functional residual capacity plus 60% inspiratory capacity was -24.7 +/- 508.1 mL. The mean differences of the total lung volumes calculated by the automated scheme and an established, commonly used semi-automated scheme were 95.2 +/- 52.5 mL and -27.7 +/- 66.9 mL for the thick and thin slice cases, respectively., Conclusion: This simple, fully-automated lung segmentation scheme provides an objective tool to facilitate lung segmentation from computed tomography scans.
Published: 2003
Full Text: View/download PDF

44. ROC-type assessments of medical imaging and CAD technologies: a perspective.

Author: Gur D
Subjects: Humans, Diagnosis, Computer-Assisted, Diagnostic Imaging, ROC Curve
Published: 2003
Full Text: View/download PDF

45. Performance change of mammographic CAD schemes optimized with most-recent and prior image databases.

Author: Zheng B, Good WF, Armfield DR, Cohen C, Hertzberg T, Sumkin JH, and Gur D
Subjects: Algorithms, Breast Neoplasms diagnostic imaging, False Positive Reactions, Female, Humans, ROC Curve, Breast Diseases diagnostic imaging, Mammography, Radiographic Image Interpretation, Computer-Assisted
Abstract: Rationale and Objectives: The authors evaluated performance changes in the detection of masses on "current" (latest) and "prior" images by computer-aided diagnosis (CAD) schemes that had been optimized with databases of current and prior mammograms., Materials and Methods: The authors selected 260 pairs of matched consecutive mammograms. Each current image depicted one or two verified masses. All prior images had been interpreted originally as negative or probably benign. A CAD scheme initially detected 261 mass regions and 465 false-positive regions on the current images, and 252 corresponding mass regions (early signs) and 471 false-positive regions on prior images. These regions were divided into two training and two testing databases. The current and prior training databases were used to optimize two CAD schemes with a genetic algorithm. These schemes were evaluated with two independent testing databases., Results: The scheme optimized with current images produced areas under the receiver operating characteristic curve of (0.89 +/- 0.01 and 0.65 +/- 0.02 when tested with current images and prior images, respectively. The scheme optimized with prior images produced areas under the receiver operating characteristic curve of 0.81 +/- 0.02 and 0.71 +/- 0.02 when tested with current images and prior images, respectively. Performance changes for both current and prior testing databases were significant (P < .01) for the two schemes., Conclusion: CAD schemes trained with current images do not perform optimally in detecting masses depicted on prior images. To optimize CAD schemes for early detection, it may be important to include in the training database a large fraction of prior images originally reported as negative and later proven to be positive.
Published: 2003
Full Text: View/download PDF

46. Computer-aided detection in mammography: an assessment of performance on current and prior images.

Author: Zheng B, Shah R, Wallace L, Hakim C, Ganott MA, and Gur D
Subjects: Breast Neoplasms diagnostic imaging, False Positive Reactions, Female, Humans, ROC Curve, Breast Diseases diagnostic imaging, Mammography, Radiographic Image Interpretation, Computer-Assisted
Abstract: Rationale and Objectives: The authors assessed and compared the performance of a computer-aided detection (CAD) scheme for the detection of masses and microcalcification clusters on a set of images collected from two consecutive ("current" and "prior") mammographic examinations., Materials and Methods: A previously developed CAD scheme was used to assess two consecutive screening mammograms from 200 cases in which the current mammogram showed a mass or cluster of microcalcifications that resulted in breast biopsy. The latest prior examinations had been initially interpreted as negative or definitely benign findings (Breast Imaging Reporting and Data System rating, 1 or 2). The study involved images of 400 examinations acquired in 200 patients. Radiologists identified 172 masses and 128 clusters of microcalcifications on the current images. The performance of the CAD scheme was analyzed and compared for the current and latest prior images., Results: There were significant differences (P < .01) between current and prior images in many feature values. The performance of the CAD scheme was significantly lower for prior than for current images (P < .01). At 0.5 and 0.2 false-positive mass and cluster cues per image, the scheme detected 78 malignant masses (78%) and 63 malignant clusters (80%) on current images. Only 42% of malignant cases were detected on prior images, including 40 masses (40%) and 36 microcalcification clusters (46%)., Conclusion: CAD schemes can detect a substantial fraction of masses and microcalcification clusters depicted on prior images. To improve performance with prior images, the scheme may have to be adaptively reoptimized with increasingly more subtle abnormalities.
Published: 2002
Full Text: View/download PDF

47. Computerized assessment of tissue composition on digitized mammograms.

Author: Chang YH, Wang XH, Hardesty LA, Chang TS, Poller WR, Good WF, and Gur D
Subjects: Breast Diseases diagnostic imaging, Female, Humans, Image Processing, Computer-Assisted instrumentation, Breast anatomy & histology, Image Processing, Computer-Assisted methods, Mammography methods, Signal Processing, Computer-Assisted instrumentation
Abstract: Rationale and Objectives: The authors developed a computerized method for the quantitative assessment of breast tissue composition on digitized mammograms., Materials and Methods: Three radiologists were asked to review 200 digitized mammograms and independently provide a Breast Imaging Reporting and Data System-like rating for breast tissue composition on a scale of 0 to 4. These values were incorporated into a "consensus" rating that was used as a reference point in the development and evaluation of a computerized method. After tissue segmentation that excluded nontissue areas, a set of quantitative features was computed. A computerized summary index that attempts to reproduce the radiologists' ratings was developed. Correlation coefficients (Pearson r) were used to compare the computerized index with the consensus ratings., Results: Some individual features computed for the relatively dense breast areas showed good correlation (r > 0.8) with the radiologists' subjective ratings. The summary index of tissue composition demonstrated a significant correlation (r = 0.87), as well., Conclusion: Computerized methods that show good correlation with radiologists' ratings of breast tissue composition can be developed.
Published: 2002
Full Text: View/download PDF

48. Statistical test to assess rank-order imaging studies.

Author: Rockette HE, Li W, Brown ML, Britton CA, Towers JT, and Gur D
Subjects: Humans, Normal Distribution, Sample Size, ROC Curve, Radiology, Statistics, Nonparametric
Abstract: Rationale and Objectives: Rank-order experiments often provide a reasonable method of determining whether a large-scale receiver operating characteristic study can be justified. The authors' purpose was to formalize a proposed method for analyzing rank-order imaging experiments and provide methods that can be used in determining sample sizes for both cases and raters., Materials and Methods: Simulations were conducted to determine the adequacy of the normal approximation of a statistic used to test the null hypothesis of random ordering. For a multireader experiment, formulas are presented and guidelines are provided to enable investigators to determine the number of required readers (raters) and cases for a specific study., Results: When there are at least five ordered images per case, 10 cases are sufficient to test a random rank order. When there are only three or four images for a case, 20 cases are required. The authors constructed tables of statistical power for selected numbers of ordered images, numbers of cases, and degrees of trend, and they also provide an approximation for use in situations that are not tabled., Conclusion: The statistical methods for analyzing rank-order experiments and estimating sample sizes for study planning are relatively simple to implement. The derived formulas for sample size estimation, when applied to typical imaging experiments, indicate that modest numbers of cases and readers are required for rank-order studies.
Published: 2001
Full Text: View/download PDF

49. Applying computer-assisted detection schemes to digitized mammograms after JPEG data compression: an assessment.

Author: Zheng B, Sumkin JH, Good WF, Maitz GS, Chang YH, and Gur D
Subjects: False Positive Reactions, Female, Humans, Photography, Algorithms, Breast Diseases diagnostic imaging, Diagnosis, Computer-Assisted, Mammography methods, Signal Processing, Computer-Assisted
Abstract: Rationale and Objectives: The authors' purpose was to assess the effects of Joint Photographic Experts Group (JPEG) image data compression on the performance of computer-assisted detection (CAD) schemes for the detection of masses and microcalcification clusters on digitized mammograms., Materials and Methods: This study included 952 mammograms that were digitized and compressed with a JPEG-compatible image-compression scheme. A CAD scheme, previously developed in the authors' laboratory and optimized for noncompressed images, was applied to reconstructed images after compression at five levels. The performance was compared with that obtained with the original noncompressed digitized images., Results: For mass detection, there were no significant differences in performance between noncompressed and compressed images for true-positive regions (P = .25) or false-positive regions (P = .40). In all six modes the scheme identified 80% of masses with less than one false-positive region per image. For the detection of microcalcification clusters, there was significant performance degradation (P < .001) at all compression levels. Detection sensitivity was reduced by 4%-10% as compression ratios increased from 17:1 to 62:1. At the same time, the false-positive detection rate was increased by 91%-140%., Conclusion: The JPEG algorithm did not adversely affect the performance of the CAD scheme for detecting masses, but it did significantly affect the detection of microcalcification clusters.
Published: 2000
Full Text: View/download PDF

50. Empiric assessment of parameters that affect the design of multireader receiver operating characteristic studies.

Author: Rockette HE, Campbell WL, Britton CA, Holbert JM, King JL, and Gur D
Subjects: Humans, Research Design, Observer Variation, ROC Curve, Radiography, Thoracic
Abstract: Rationale and Objectives: The authors attempted to assess experimentally the magnitude of reader variability and the correlations and interactions among cases, readers, and modalities during observer performance studies and their possible effects on study design and sample size., Materials and Methods: Published data from 32 selected receiver operating characteristic (ROC) studies were reviewed to compare the magnitude of the variance component from readers with the variance component from modality. Estimates of correlation and interactions among cases, readers, and modalities were also computed directly from ROC data ascertained during two large studies performed in our laboratory. Each of these two studies included 529 cases and six readers, but one study used eight modalities and the other nine., Results: Published results indicate that reader variability is task dependent and larger (P < .05) than modality variability in detection of interstitial disease. Measured correlations between modalities for the same reader were task dependent and ranged from 0.35 to 0.59. Modality-by-reader and modality-by-case interactions often are not important factors. The random error term was greater than the modality-by-reader interaction in 11 of 20 comparisons and greater than the modality-by-case interaction in eight of 20 comparisons., Conclusion: Use of the same cases interpreted with different modes is justifiable in many situations because of the high variability from readers. This comprehensive review of existing ROC studies resulted in parameter assessments that can be used to better estimate sample-size requirements in multireader ROC studies.
Published: 1999
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

Publisher

67 results on '"Gur D"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources