I. Introduction
Deep Neural Networks (DNNs) have achieved significant success in processing regular data that can be represented in Euclidean space (multidimensional tensors). However, not all data can fit into this category. For instance, molecules in biochemistry, circuits in electrical engineering, and mechanical models in physics, require new algorithms to process graph data, and these algorithms need to offer high expressive power in the modeling of interactions between different objects. Nevertheless, the flexibility of graph representations challenges the capabilities of DNNs, boosting the demand for architectures supporting graph-represented data. Graph Neural Networks (GNNs) have drawn ample attention since they can generalize the ability of DNNs to learn deep representations to graphs and their elements. This has led to GNNs emerging as one of the best-performing models in such tasks as recommendation systems [1], interaction and dynamics modeling in physics [2], protein interface and chemical reaction prediction [3], circuit design [4] and others. An important GNN trait is its typically small model size, but the number of computations they need to perform is very high and depends on the input graph size. For instance, the GIN model from Fig. 1 has merely 6,437 parameters, but on a relatively simple Reddit-Binary dataset needs to perform up to 20 million Multiply-Accumulate (MAC) operations per graph. In contrast, ResNet18 DNN [5] with 11.7 million parameters performs 1.8 billion MACs per one image from a complex Imagenet dataset [6]. A comparison of these values shows that GNN operations-to-parameters ratio is almost 20x times higher than for a DNN. Additionally, GNN models have a significantly higher variety of operators (layers) they can work with. Unlike DNNs, which mostly utilize linear and convolution layers in their architectures, GNNs can be based on more than 40 types of operators [7] with different underlying principles. With such high variability in available architectures, GNN efficiency optimization can be challenging, since one approach applied to different operators does not guarantee the same effect. GNNs are used to analyze relationships between nodes, which means that their runtime is highly impacted by the size of the input graph. To compute the forward pass of a GNN, every node has to sample and aggregate information from its neighborhood determined by a chosen receptive field, which depends on the depth of a GNN. Thus, efficient processing of graph data becomes a concern in certain industries. For example, Google designed a scalable and task-specific model [8] to deploy it on YouTube graphs with tens of billions of nodes and hundreds of trillions of edges by employing locality-sensitive hashing techniques. It allowed to cut the production time from days and weeks to several hours. GNNs can also be employed in real-time computer vision tasks [9]. RGBD cameras (RGB image + Depth) gain popularity in autonomous driving since they provide crucial information about object locations. Their output data can be processed by 3D convolutions but any 3D map can also be represented as a graph, allowing the usage of GNNs. These factors encourage the development of new GNN models with a more energy-efficient inference. The extent of research in this field is substantial for DNNs, involving quantization [10], pruning and model compression [11], or approximate operations [12], [13]. However, improving the GNN inference efficiency is still in its infancy.