1. Ultra-Fast Digital Tomosynthesis Reconstruction Using General-Purpose GPU Programming for Image-Guided Radiation Therapy
- Author
-
Park, Justin C., Park, Sung Ho, Kim, Jin Sung, Han, Youngyih, Cho, Min Kook, Kim, Ho Kyung, Liu, Zhaowei, Jiang, Steve B., Song, Bongyong, and Song, William Y.
- Abstract
The purpose of this work is to demonstrate an ultra-fast reconstruction technique for digital tomosynthesis (DTS) imaging based on the algorithm proposed by Feldkamp, Davis, and Kress (FDK) using standard general-purpose graphics processing unit (GPGPU) programming interface. To this end, the FDK-based DTS algorithm was programmed “in-house” with C language with utilization of 1) GPU and 2) central processing unit (CPU) cards. The GPU card consisted of 480 processing cores (2 × 240 dual chip) with 1,242 MHz processing clock speed and 1,792 MB memory space. In terms of CPU hardware, we used 2.68 GHz clock speed, 12.0 GB DDR3 RAM, on a 64-bit OS. The performance of proposed algorithm was tested on twenty-five patient cases (5 lung, 5 liver, 10 prostate, and 5 head-and-neck) scanned either with a full-fanor half-fanmode on our cone-beam computed tomography (CBCT) system. For the full-fanscans, the projections from 157.5°–202.5° (45°-scan) were used to reconstruct coronal DTS slices, whereas for the half-fanscans, the projections from both 157.5°–202.5° and 337.5°–22.5° (2 × 45°-scan) were used to reconstruct larger FOV coronal DTS slices. For this study, we chose 45°-scan angle that contained ~80 projections for the full-fanand ~160 projections with 2 × 45°-scan angle for the half-fanmode, each with 1024 × 768 pixels with 32-bit precision. Absolute pixel value differences, profiles, and contrast-to-noise ratio (CNR) calculations were performed to compare and evaluate the images reconstructed using GPU- and CPU-based implementations. The time dependence on the reconstruction volume was also tested with (512 × 512) × 16, 32, 64, 128, and 256 slices. In the end, the GPU-based implementation achieved, at most, 1.3 and 2.5 seconds to complete full reconstruction of 512 × 512 × 256 volume, for the full-fan and half-fan modes, respectively. In turn, this meant that our implementation can process > 13 projections-per-second (pps) and > 18 pps for the full-fanand half-fanmodes, respectively. Since commercial CBCT system nominally acquires 11 pps (with 1 gantry-revolution-per-minute), our GPU-based implementation is sufficient to handle the incoming projections data as they are acquired and reconstruct the entire volume immediately after completing the scan. In addition, on increasing the number of slices (hence volume) to be reconstructed from 16 to 256, only minimal increases in reconstruction time were observed for the GPU-based implementation where from 0.73 to 1.27 seconds and 1.42 to 2.47 seconds increase were observed for the full-fanand half-fanmodes, respectively. This resulted in speed improvement of up to 87 times compared with the CPU-based implementation (for 256 slices case), with visually identical images and small pixel-value discrepancies (< 6.3%), and CNR differences (< 2.3%). With this achievement, we have shown that time allocation for DTS image reconstruction is virtually eliminated and that clinical implementation of this approach has become quite appealing. In addition, with the speed achievement, further image processing and real-time applications that was prohibited prior due to time restrictions can now be tempered with.
- Published
- 2011
- Full Text
- View/download PDF