I. Introduction
As the rapid development of digital imaging devices and the large increase of images and video generated from social media, there has been a surge of interest in image processing applications and softwares. At the same time, the image processing algorithms they depended on are evolving in both complexity and scale, which made them in great need of high performance computing implementations. Meanwhile, as the growing of high-performance field, there have been more and more types of high-performance computing devices developed and applied into HPC appications, from the early SMP devices, GPGPUs, to the more recent many-core architecture Intel Phi coprocessors [1]. To take advantage the computing power, extensive work have been devoted to accelerate the computational intensive image processing [2]–[5] and computer vision [6], [7] applications on highperformance platforms.