1. Introduction
Due to the emergence of Terascale and Petascale computing, people begin to concentrate on developing powerful and efficient systems to accelerate the solving of these problems. Among the current platforms, supercomputers consisting of numerous modern graphics processing units (GPUs) are obtaining substantial attention. In recent years, with the development of massive parallel programming language including CUDA [5] and OpenCL [6], high performance GPUs are widely used to settle large scale computation problems from different domains. By appropriately parallelizing the execution, GPU-based implementations are able to reduce the processing time by up to thousands of times compared to the sequential counterparts.