1. INTRODUCTION
Due to the physical limitations to building faster single core microprocessors, the development and use of multi- and many-core architectures has been the focus of attention for the past few years. Besides the increasing number of processor cores on a single chip, new architectures have emerged that support general purpose computing - the most prominent of which are Graphical Processing Units (GPUs). General Purpose computing on Graphical Processors (GPGPU) has become very popular in the high performance computing community; a great number of papers discuss its viability in accelerating applications ranging from molecular dynamics [1] through dense [12] and sparse linear algebra [4] to medical imaging [17].