1. Deep learning-based segmentation of high-resolution computed tomography image data outperforms commonly used automatic bone segmentation methods
- Author
-
Karl J. Jepsen, Daniella M. Patton, Rob W. Goulet, Nicolas Piché, Mike Marsh, Todd L. Bredbenner, Erin M.R. Bigelow, Benjamin Provencher, Sean K. Carroll, and Emilie N. Henning
- Subjects
Ground truth ,Similarity (geometry) ,Sørensen–Dice coefficient ,Computer science ,business.industry ,Deep learning ,Metric (mathematics) ,Segmentation ,Pattern recognition ,Artificial intelligence ,business ,Thresholding ,Convolutional neural network - Abstract
Segmenting bone from background is required to quantify bone architecture in computed tomography (CT) image data. A deep learning approach using convolutional neural networks (CNN) is a promising alternative method for automatic segmentation. The study objectives were to evaluate the performance of CNNs in automatic segmentation of human vertebral body (micro-CT) and femoral neck (nano-CT) data and to investigate the performance of CNNs to segment data across scanners.Scans of human L1 vertebral bodies (microCT [North Star Imaging], n=28, 53μm3) and femoral necks (nano-CT [GE], n=28, 27μm3) were used for evaluation. Six slices were selected for each scan and then manually segmented to create ground truth masks (Dragonfly 4.0, ORS). Two-dimensional U-Net CNNs were trained in Dragonfly 4.0 with images of the [FN] femoral necks only, [VB] vertebral bodies only, and [F+V] combined CT data. Global (i.e., Otsu and Yen) and local (i.e., Otsu r = 100) thresholding methods were applied to each dataset. Segmentation performance was evaluated using the Dice coefficient, a similarity metric of overlap. Kruskal-Wallis and Tukey-Kramer post-hoc tests were used to test for significant differences in the accuracy of segmentation methods.The FN U-Net had significantly higher Dice coefficients (i.e., better performance) than the global (Otsu: p=0.001; Yen: p=0.001) and local (Otsu [r=100]: p=0.001) thresholding methods and the VB U-Net (p=0.001) but there was no significant difference in model performance compared to the FN + VB U-net (p=0.783) on femoral neck image data. The VB U-net had significantly higher Dice coefficients than the global and local Otsu (p=0.001 for both) and FN U-Net (p=0.001) but not compared to the Yen (p=0.462) threshold or FN + VB U-net (p=0.783) on vertebral body image data.The results demonstrate that the U-net architecture outperforms common thresholding methods. Further, a network trained with bone data from a different system (i.e., different image acquisition parameters and voxel size) and a different anatomical site can perform well on unseen data. Finally, a network trained with combined datasets performed well on both datasets, indicating that a network can feasibly be trained with multiple datasets and perform well on varied image data.
- Published
- 2021