I. Introduction
Sparse matrix computations arise in numerous engineering and science applications. One of the most common kernels is sparse matrix-vector multiplication (SpMV). This operation represents one of the most fundamental performance bottlenecks in solving sparse linear systems and eigenvalue problems. In SpMV, the operation is performed, where A is a sparse matrix and are dense vectors. Vectors x and y provide the only opportunities for data reuse since elements of matrix A are used only once. Sparse matrices use special data structures that store only the nonzero elements and hence eliminate unnecessary storage and computations [1] [2]. However, these structures introduce indirect and irregular memory accesses which when combined with the lack of data reuse leads to low computational intensity, i.e. number of arithmetic operations per memory reference [3].