1 Introduction
High-resolution Compute Tomography (CT) is a technology used in a wide variety of fields, e.g. medical diagnosis, non-invasive inspection [62], and reverse engineering [17], [50]. In the past decades, the size of a single three-dimensional (3D) volume generated by CT systems has increased from hundreds of megabytes (the typical sizes of a volume are 2563,5123 to several gigabytes (i.e. 20483, 40963) [7], [42], [66]. The increased demand for rapid tomography reconstruction and the associated high computational cost attracted heavy attention and efforts from the HPC community [8], [11], [19], [25], [28], [47], [54], [55], [66], [68], [76]. As illustrated in [48], the FDK
Feldkamp, Davis, and Kress [23] presented a convolution-backprojection formulation (known as FDK algorithm) for CT image reconstruction in 1984. FDK is also known as the Filtered Back Projection (FBP) algorithm.
algorithm is widely regarded as the primary method to reconstruct 3D images (or volumes) from projections, i.e. X-ray images. The FDK algorithm includes a filtering stage (also known as convolution) and a back-projection stage. The computational complexities of those two stages are and , respectively. Researchers are increasingly relying on the latest accelerators to improve the computational performance of FDK, e.g. Application Specific Integrated Circuits (ASIC) [72], Field-Programming Gate Array (FPGA) [16], [27], [64], [75], Digital Signal Processor (DSP) [37], Intel Xeon-Phi [53], Multi-core CPUs [68], and Graphics Processing Unit (GPU) [51], [73], [77], [78]. This paper focuses on GPU-accelerated supercomputers for two reasons. First, GPUs are dominantly used for tomographic image reconstruction [20], [28], [33], [55], [59], [74]. Second, GPU-accelerated supercomputers are increasingly gaining ground in top-tier HPC systems.