I. Introduction
Many of the modern data are naturally represented by graphs [1], [2] and it is an important research problem to design neural network architectures which can work with graphs. The adjacency matrix of a graph exhibits the local connectivity of the nodes. Thus, it is straightforward to extend the local feature aggregation used in the Convolutional Neural Networks (CNNs) to the Graph Neural Networks (GNNs) [3]–[5]. However, when the nodes of the graphs are not labeled/attributed or the labels/attributes of the nodes do not carry information about the differences between the nodes or any information about their locations in the graph, the GNN can fail to extract discriminative features. The GNN maps nodes whose corresponding local structures are similar to same/close feature vectors in the continuous feature space. Although mapping nodes with similar local structures to similar feature vectors might be desirable in some applications, this feature can make the GNN unable to distinguish graphs whose difference is not in their local structures. In addition, when the nodes are not labeled/attributed, the spatial graph convolution only propagates information about the degree of the nodes which might not lead to extracting informative features about the topological structure of the graph. Another limitation of the current GNN architectures is that they are mostly unable to do the hierarchical feature learning employed in the CNNs [6]. The main reason is that graphs lack the tensor representation and it is difficult to measure how accurate a subset of nodes represent the topological structure of the given graph.