Journals & Magazines >IEEE Transactions on Computers >Volume: 73 Issue: 2

Approximation- and Quantization-Aware Training for Graph Neural Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Graph Neural Networks (GNNs) are one of the best-performing models for processing graph data. They are known to have considerable computational complexity, despite the sm...View more

Metadata

Abstract:

Graph Neural Networks (GNNs) are one of the best-performing models for processing graph data. They are known to have considerable computational complexity, despite the smaller number of parameters compared to traditional Deep Neural Networks (DNNs). Operations-to-parameters ratio for GNNs can be tens and hundreds of times higher than for DNNs, depending on the input graph size. This complexity indicates the importance of arithmetic operation optimization within GNNs through model quantization and approximation. In this work, for the first time, we combine both approaches and implement quantization- and approximation-aware training for GNNs to sustain their accuracy under the errors induced by inexact multiplications. We employ matrix multiplication CUDA kernel to speed up the simulation of approximate multiplication within GNNs. Further, we demonstrate the execution speed, accuracy, and energy efficiency of GNNs with approximate multipliers in comparison with quantized low-bit GNNs. We evaluate the performance of state-of-the-art GNN architectures (i.e., GIN, SAGE, GCN, and GAT) on various datasets and tasks (i.e., Reddit-Binary, Collab for graph classification, Cora and PubMed for node classification) with a wide range of approximate multipliers. Our framework is available online: https://github.com/TUM-AIPro/AxC-GNN .

Published in: IEEE Transactions on Computers ( Volume: 73, Issue: 2, February 2024)

Page(s): 599 - 612

Date of Publication: 29 November 2023

ISSN Information:

DOI: 10.1109/TC.2023.3337319

Funding Agency:

Contents

I. Introduction

Deep Neural Networks (DNNs) have achieved significant success in processing regular data that can be represented in Euclidean space (multidimensional tensors). However, not all data can fit into this category. For instance, molecules in biochemistry, circuits in electrical engineering, and mechanical models in physics, require new algorithms to process graph data, and these algorithms need to offer high expressive power in the modeling of interactions between different objects. Nevertheless, the flexibility of graph representations challenges the capabilities of DNNs, boosting the demand for architectures supporting graph-represented data. Graph Neural Networks (GNNs) have drawn ample attention since they can generalize the ability of DNNs to learn deep representations to graphs and their elements. This has led to GNNs emerging as one of the best-performing models in such tasks as recommendation systems [1], interaction and dynamics modeling in physics [2], protein interface and chemical reaction prediction [3], circuit design [4] and others. An important GNN trait is its typically small model size, but the number of computations they need to perform is very high and depends on the input graph size. For instance, the GIN model from Fig. 1 has merely 6,437 parameters, but on a relatively simple Reddit-Binary dataset needs to perform up to 20 million Multiply-Accumulate (MAC) operations per graph. In contrast, ResNet18 DNN [5] with 11.7 million parameters performs 1.8 billion MACs per one image from a complex Imagenet dataset [6]. A comparison of these values shows that GNN operations-to-parameters ratio is almost 20x times higher than for a DNN. Additionally, GNN models have a significantly higher variety of operators (layers) they can work with. Unlike DNNs, which mostly utilize linear and convolution layers in their architectures, GNNs can be based on more than 40 types of operators [7] with different underlying principles. With such high variability in available architectures, GNN efficiency optimization can be challenging, since one approach applied to different operators does not guarantee the same effect. GNNs are used to analyze relationships between nodes, which means that their runtime is highly impacted by the size of the input graph. To compute the forward pass of a GNN, every node has to sample and aggregate information from its neighborhood determined by a chosen receptive field, which depends on the depth of a GNN. Thus, efficient processing of graph data becomes a concern in certain industries. For example, Google designed a scalable and task-specific model [8] to deploy it on YouTube graphs with tens of billions of nodes and hundreds of trillions of edges by employing locality-sensitive hashing techniques. It allowed to cut the production time from days and weeks to several hours. GNNs can also be employed in real-time computer vision tasks [9]. RGBD cameras (RGB image + Depth) gain popularity in autonomous driving since they provide crucial information about object locations. Their output data can be processed by 3D convolutions but any 3D map can also be represented as a graph, allowing the usage of GNNs. These factors encourage the development of new GNN models with a more energy-efficient inference. The extent of research in this field is substantial for DNNs, involving quantization [10], pruning and model compression [11], or approximate operations [12], [13]. However, improving the GNN inference efficiency is still in its infancy.

References is not available for this document.

MIT Libraries

MIT Libraries

Approximation- and Quantization-Aware Training for Graph Neural Networks

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Approximation- and Quantization-Aware Training for Graph Neural Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References