Loading [MathJax]/extensions/MathZoom.js
Efficient sparse matrix-vector multiplication on cache-based GPUs | IEEE Conference Publication | IEEE Xplore

Efficient sparse matrix-vector multiplication on cache-based GPUs


Abstract:

Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studies have shown that it is a bandwidth-limited operation on current hard...Show More

Abstract:

Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studies have shown that it is a bandwidth-limited operation on current hardware. On cache-based architectures the main factors that influence performance are spatial locality in accessing the matrix, and temporal locality in re-using the elements of the vector. This paper discusses efficient implementations of sparse matrix-vector multiplication on NVIDIA's Fermi architecture, the first to introduce conventional L1 caches to GPUs. We focus on the compressed sparse row (CSR) format for developing general purpose code. We present a parametrised algorithm, show the effects of parameter tuning on performance and introduce a method for determining the nearoptimal set of parameters that incurs virtually no overhead. On a set of sparse matrices from the University of Florida Sparse Matrix Collection we show an average speed-up of 2.1 times over NVIDIA's CUSPARSE 4.0 library in single precision and 1.4 times in double precision. Many algorithms require repeated evaluation of sparse matrix-vector products with the same matrix, so we introduce a dynamic run-time auto-tuning system which improves performance by 10-15% in seven iterations. The CSR format is compared to alternative ELLPACK and HYB formats and the cost of conversion is assessed using CUSPARSE. Sparse matrix-vector multiplication performance is also analysed when solving a finite element problem with the conjugate gradient method. We show how problemspecific knowledge can be used to improve performance by up to a factor of two.
Date of Conference: 13-14 May 2012
Date Added to IEEE Xplore: 25 October 2012
ISBN Information:
Conference Location: San Jose, CA, USA

1. INTRODUCTION

Due to the physical limitations to building faster single core microprocessors, the development and use of multi- and many-core architectures has been the focus of attention for the past few years. Besides the increasing number of processor cores on a single chip, new architectures have emerged that support general purpose computing - the most prominent of which are Graphical Processing Units (GPUs). General Purpose computing on Graphical Processors (GPGPU) has become very popular in the high performance computing community; a great number of papers discuss its viability in accelerating applications ranging from molecular dynamics [1] through dense [12] and sparse linear algebra [4] to medical imaging [17].

Contact IEEE to Subscribe

References

References is not available for this document.