Conferences >2012 Innovative Parallel Comp...

Efficient sparse matrix-vector multiplication on cache-based GPUs

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studies have shown that it is a bandwidth-limited operation on current hard...Show More

Metadata

Abstract:

Sparse matrix-vector multiplication is an integral part of many scientific algorithms. Several studies have shown that it is a bandwidth-limited operation on current hardware. On cache-based architectures the main factors that influence performance are spatial locality in accessing the matrix, and temporal locality in re-using the elements of the vector. This paper discusses efficient implementations of sparse matrix-vector multiplication on NVIDIA's Fermi architecture, the first to introduce conventional L1 caches to GPUs. We focus on the compressed sparse row (CSR) format for developing general purpose code. We present a parametrised algorithm, show the effects of parameter tuning on performance and introduce a method for determining the nearoptimal set of parameters that incurs virtually no overhead. On a set of sparse matrices from the University of Florida Sparse Matrix Collection we show an average speed-up of 2.1 times over NVIDIA's CUSPARSE 4.0 library in single precision and 1.4 times in double precision. Many algorithms require repeated evaluation of sparse matrix-vector products with the same matrix, so we introduce a dynamic run-time auto-tuning system which improves performance by 10-15% in seven iterations. The CSR format is compared to alternative ELLPACK and HYB formats and the cost of conversion is assessed using CUSPARSE. Sparse matrix-vector multiplication performance is also analysed when solving a finite element problem with the conjugate gradient method. We show how problemspecific knowledge can be used to improve performance by up to a factor of two.

Published in: 2012 Innovative Parallel Computing (InPar)

Date of Conference: 13-14 May 2012

Date Added to IEEE Xplore: 25 October 2012

ISBN Information:

DOI: 10.1109/InPar.2012.6339602

Conference Location: San Jose, CA, USA

Contents

1. INTRODUCTION

Due to the physical limitations to building faster single core microprocessors, the development and use of multi- and many-core architectures has been the focus of attention for the past few years. Besides the increasing number of processor cores on a single chip, new architectures have emerged that support general purpose computing - the most prominent of which are Graphical Processing Units (GPUs). General Purpose computing on Graphical Processors (GPGPU) has become very popular in the high performance computing community; a great number of papers discuss its viability in accelerating applications ranging from molecular dynamics [1] through dense [12] and sparse linear algebra [4] to medical imaging [17].

MIT Libraries

MIT Libraries

Efficient sparse matrix-vector multiplication on cache-based GPUs

Abstract:

Metadata

Abstract:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Efficient sparse matrix-vector multiplication on cache-based GPUs

Alerts

Abstract:

Metadata

Abstract:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?