1. Performance-aware programming for intraoperative intensity-based image registration on graphics processing units
- Author
-
Ka-Wai Kwok, Martin C. W. Leong, Zhiyu Liu, Bowen P. Y. Kwan, Nassir Navab, Wayne Luk, Yui-Lun Ng, and Kit-Hang Lee
- Subjects
Parallel computing ,Speedup ,Computer science ,Computation ,Normal Distribution ,Biomedical Engineering ,Image registration ,Health Informatics ,Non-rigid registration ,02 engineering and technology ,Image-guided treatment ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Robustness (computer science) ,Monitoring, Intraoperative ,Computer Graphics ,Image Processing, Computer-Assisted ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Radiology, Nuclear Medicine and imaging ,Graphics ,Implementation ,020203 distributed computing ,Process (computing) ,Reproducibility of Results ,General Medicine ,Computer Graphics and Computer-Aided Design ,ddc ,Computer Science Applications ,Computer engineering ,Demons algorithm ,Original Article ,Programming Languages ,Surgery ,Computer Vision and Pattern Recognition ,Central processing unit ,Surgical guidance ,Algorithms ,Software - Abstract
Purpose Intensity-based image registration has been proven essential in many applications accredited to its unparalleled ability to resolve image misalignments. However, long registration time for image realignment prohibits its use in intra-operative navigation systems. There has been much work on accelerating the registration process by improving the algorithm’s robustness, but the innate computation required by the registration algorithm has been unresolved. Methods Intensity-based registration methods involve operations with high arithmetic load and memory access demand, which supposes to be reduced by graphics processing units (GPUs). Although GPUs are widespread and affordable, there is a lack of open-source GPU implementations optimized for non-rigid image registration. This paper demonstrates performance-aware programming techniques, which involves systematic exploitation of GPU features, by implementing the diffeomorphic log-demons algorithm. Results By resolving the pinpointed computation bottlenecks on GPU, our implementation of diffeomorphic log-demons on Nvidia GTX Titan X GPU has achieved ~ 95 times speed-up compared to the CPU and registered a 1.3-M voxel image in 286 ms. Even for large 37-M voxel images, our implementation is able to register in 8.56 s, which attained ~ 258 times speed-up. Our solution involves effective employment of GPU computation units, memory, and data bandwidth to resolve computation bottlenecks. Conclusion The computation bottlenecks in diffeomorphic log-demons are pinpointed, analyzed, and resolved using various GPU performance-aware programming techniques. The proposed fast computation on basic image operations not only enhances the computation of diffeomorphic log-demons, but is also potentially extended to speed up many other intensity-based approaches. Our implementation is open-source on GitHub at https://bit.ly/2PYZxQz.
- Published
- 2021
- Full Text
- View/download PDF