1. Agreement of the order of overall performance levels under different reading paradigms.
- Author
-
Gur D, Bandos AI, Klym AH, Cohen CS, Hakim CM, Hardesty LA, Ganott MA, Perrin RL, Poller WR, Shah R, Sumkin JH, Wallace LP, and Rockette HE
- Subjects
- Female, Humans, ROC Curve, Reproducibility of Results, Sensitivity and Specificity, Breast Neoplasms diagnostic imaging, Data Interpretation, Statistical, Image Interpretation, Computer-Assisted methods, Mammography methods, Observer Variation, Professional Competence, Task Performance and Analysis
- Abstract
Rationale and Objectives: To investigate consistency of the orders of performance levels when interpreting mammograms under three different reading paradigms., Materials and Methods: We performed a retrospective observer study in which nine experienced radiologists rated an enriched set of mammography examinations that they personally had read in the clinic ("individualized") mixed with a set that none of them had read in the clinic ("common set"). Examinations were interpreted under three different reading paradigms: binary using screening Breast Imaging Reporting and Data System (BI-RADS), receiver-operating characteristic (ROC), and free-response ROC (FROC). The performance in discriminating between cancer and noncancer findings under each of the paradigms was summarized using Youden's index/2+0.5 (Binary), nonparameteric area under the ROC curve (AUC), and an overall FROC index (JAFROC-2). Pearson correlation coefficients were then computed to assess consistency in the ordering of observers' performance levels. Statistical significance of the computed correlation coefficients was assessed using bootstrap confidence intervals obtained by resampling sets of examination-specific observations., Results: All but one of the computed pair-wise correlation coefficients were larger than 0.66 and were significantly different from zero. The correlation between the overall performance measures under the Binary and ROC paradigms was the lowest (0.43) and was not significantly different from zero (95% confidence interval -0.078 to 0.733)., Conclusion: The use of different evaluation paradigms in the laboratory tends to lead to consistent ordering of the overall performance levels of observers. However, one should recognize that conceptually similar performance indexes resulting from different paradigms often measure different performance characteristics and thus disagreements are not only possible but frequently quite natural.
- Published
- 2008
- Full Text
- View/download PDF