I. Introduction
Existing Graph Neural Network (GNN) models on learning node representations rely on a similar methodology that utilises a GNN layer to aggregate the sampled neighbouring nodes' features in a number of iterations, via non-linear transformation and aggregation functions. Its effectiveness has been widely proved, however, a major limitation of these GNN models is that they are inherently flat as they only propagate information across the observed edges in the original graph. Thus, they lack the capacity to encode features in the high-order neighbourhood in the graphs [1], [17]. For example, in an academic collaboration network, flat GNN models could capture the micro semantic (e.g., co-authorships) between authors, but neglect their macro semantics (e.g., belonging to different research institutes).