Back to Search Start Over

Assessing the speed-accuracy trade-offs of popular convolutional neural networks for single-crop rib fracture classification.

Authors :
Castro-Zunti, Riel
Chae, Kum Ju
Choi, Younhee
Jin, Gong Yong
Ko, Seok-bum
Source :
Computerized Medical Imaging & Graphics. Jul2021, Vol. 91, pN.PAG-N.PAG. 1p.
Publication Year :
2021

Abstract

• DL models can classify rib fractures: fresh/acute, healed/old and nonfractured/normal. • Classifiers learn from surrounding tissue: background removal shown to lower accuracy. • InceptionV3 Block 7 had 96.0% accuracy & 94.0% recall for acute vs. old vs. normal. • InceptionV3 Block 7 had 97.8% accuracy, 94.6% recall & 94.7% AUC for acute vs. others. • InceptionV3 Block 7 crop inference time is 14/12 ms CPU/GPU, 1.7× faster than baseline. Rib fractures are injuries commonly assessed in trauma wards. Deep learning has demonstrated state-of-the-art accuracy for a variety of tasks, including image classification. This paper assesses the speed-accuracy trade-offs and general suitability of four popular convolutional neural networks to classify rib fractures from axial computed tomography imagery. We transfer learned InceptionV3, ResNet50, MobileNetV2, and VGG16 models, additionally training "decomposed" models comprised of taking only the first n blocks for each block for each architecture. Given that acute (new) fractures are generally most important to detect, we trained two types of models: a classful model with classes acute, old (healed), and normal (non-fractured); and a binary model with acute vs. the other classes. We found that the first 7 blocks of InceptionV3 achieved the best results and general speed-accuracy trade-off. The classful model achieved a 5-fold cross-validation average accuracy and macro recall of 96.00% and 94.0%, respectively. The binary model achieved a 5-fold cross-validation average accuracy, macro recall, and area under receiver operator characteristic curve of 97.76%, 94.6%, and 94.7%, respectively. On a Windows 10 PC with 32GB RAM and an Nvidia 1080ti GPU, the model's average CPU and GPU per-crop inference times were 13.6 and 12.2 ms, respectively. Compared to the InceptionV3 Block 7 classful model, a radiologist with 9 years of experience was less accurate but more sensitive to acute fractures; meanwhile, the deep learning model had fewer false positive diagnoses and better sensitivity to old fractures and normal ribs. The Cohen's Kappa between the two was 0.813. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08956111
Volume :
91
Database :
Academic Search Index
Journal :
Computerized Medical Imaging & Graphics
Publication Type :
Academic Journal
Accession number :
151663176
Full Text :
https://doi.org/10.1016/j.compmedimag.2021.101937