I. Introduction
Graph neural networks (GNNs) [1], [2], [3], [4] have achieved notable success in a variety of applications [5], [6], [7], [8] and have consequently become a rapidly growing area of research [9], [10], [11], [12]. Despite success, GNNs exhibit exponentially growing computation costs with the size of the graph data [13] and the complexity of the model structure increasing [14], [15], [16], during both training and inference [17]. This has proved prohibitive to their deployment in resource-constrained or time-sensitive applications. For example, a two-layer graph convolutional network (GCN) model [1] with 32 hidden units requires approximately 19 GFLOPs (flops: floating point operations) to process the Reddit graph [18], [19], twice as much as the requirements of the ResNet50 model [20] on ImageNet. If the number of layers of the GCN model increases to 64, the model requires approximately 73 GFLOPs. Such enormous computation costs of GNNs are primarily due to three aspects: 1) the large-scale adjacency matrix in real-world datasets (e.g., millions of edges in the OGB datasets [13]); 2) the high-dimensional node feature vectors (e.g., each node in the Citeseer graph has 3703 features); and 3) the sheer number of model parameters (e.g., a 64-layer GCN via initial residual and identity mapping (GCNII) [16] with 64 hidden units contains about 262144 parameters).