Conferences >2020 IEEE International Paral...

A High-Throughput Solver for Marginalized Graph Kernels on GPU

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel betwee...Show More

Metadata

Abstract:

We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs. The solver implements a preconditioned conjugate gradient (PCG) method to compute the solution to a generalized Laplacian equation associated with the tensor product of two graphs. To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG. Such on-the-fly computation is accomplished by using threads in a warp to cooperatively stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. Besides, we propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to improve the efficiency of the sparse format.We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales.

Published in: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Date of Conference: 18-22 May 2020

Date Added to IEEE Xplore: 14 July 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/IPDPS47924.2020.00080

Conference Location: New Orleans, LA, USA

Contents

I. Introduction

Recent advances in machine learning have sparked unique opportunities for building artificial intelligence on graphs, which is a versatile data structure for representing non-sequential data of discrete nature. As illustrated by Figure 1, a distinction of graph-based discrete data from vector-based discretizable data is that the former consists of indivisible elements that must be inserted or withdrawn atomically. In contrast, the latter consist of discretized samples drawn from a continuous signal at tunable resolutions. Consequently, graph data does not trivially permit interpolation, convolution, and inner product, which are the operations commonly used in feature extraction. As a result, special care must be taken to generalize machine learning algorithms that operate on fixed-length feature vectors and uniform grids to their graph-based counterparts.

References is not available for this document.

A High-Throughput Solver for Marginalized Graph Kernels on GPU

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A High-Throughput Solver for Marginalized Graph Kernels on GPU

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References