Processing math: 0%
WirelessNet: An Efficient Radio Access Network Model Based on Heterogeneous Graph Neural Networks | IEEE Journals & Magazine | IEEE Xplore

WirelessNet: An Efficient Radio Access Network Model Based on Heterogeneous Graph Neural Networks


WirelessNet is designed to simulate the wireless phenomena of a radio access network within its model architecture. Model parameters associated to the same underlying wir...

Abstract:

A network digital twin can enable the ability to safely and rapidly recreate what-if scenarios of mobile networks for more cost effective and intelligent network optimiza...Show More

Abstract:

A network digital twin can enable the ability to safely and rapidly recreate what-if scenarios of mobile networks for more cost effective and intelligent network optimization. Towards enabling a network digital twin of mobile networks, accurate and efficient radio access network models are needed. In this work, we present WirelessNet, a novel radio access network model based on Heterogeneous Message Passing Graph Neural Networks (HMPGNNs). WirelessNet represents network nodes and the underlying wireless phenomena between them as nodes and edges of different type in a heterogeneous graph. Heterogeneous graphs are fed as samples into the HMPGNN model to simulate the wireless phenomena within WirelessNet’s model architecture. Model parameters associated to the same underlying wireless phenomena are shared across network nodes. Results using system-level simulations to train and evaluate our proposal, show that WirelessNet efficiently outputs accurate downlink rates and vector representations of users, even for network deployments unseen during training, with significantly less computational runtime than a cellular network simulator and more accuracy than typical neural network architectures. With ablation experiments, we validate the downlink signal-to-interference-and-noise ratio (SINR) user equipment (UE) node feature as the most significant contributor to reconstruct downlink rates. In a more practical setting without SINR and with reference signal received power (RSRP) from serving base station (BS), WirelessNet generalizes to unseen network deployments and significantly outperforms homogeneous graph neural networks (GNNs). We further show the benefits of our proposal by implementing two network applications served by WirelessNet, namely: radio access network deployment planning and artificial intelligence/machine learning (AI/ML) model training for quality of service (QoS) prediction.
WirelessNet is designed to simulate the wireless phenomena of a radio access network within its model architecture. Model parameters associated to the same underlying wir...
Published in: IEEE Access ( Volume: 13)
Page(s): 36006 - 36023
Date of Publication: 24 February 2025
Electronic ISSN: 2169-3536

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Driven by emerging novel user applications and ever-increasing number of mobile subscribers, the complexity of current and future mobile networks keeps growing at a rapid pace. Mobile network operators are required to i ) guarantee close-to-optimal performance and ii ) continually expand the radio access network. Accordingly, the Operation, Administration and Maintenance (OAM) activities keep increasing in order to optimize, expand and integrate new features. Current OAM procedures1 have significant time cost and can cause increased service impact in the radio access network. For instance, an OAM procedure may cause a suboptimal operation or failure in the radio access network, which translates into significant economic costs due to the negative impact on the wireless communication service provided to mobile subscribers. Consequently, the Third Generation Partnership Project (3GPP) studies the use of network digital twin2 for network management to efficiently verify the behavior of new network configurations before deployment so as to reduce OAM costs in mobile networks [1].

To enable the potential of network digital twins in mobile networks, an accurate, efficient and scalable radio access network model is needed. By accurately and rapidly assessing multiple what-if scenarios and configurations in a safe manner, more optimized solutions can be found with the network digital twin and transferred to the radio access network, even in a real-time closed-loop. Network digital twins can also facilitate the integration of artificial intelligence/machine learning (AI/ML) in radio access networks by generating synthetic data for AI/ML model training [1] as well as provide a safe and accurate model for AI/ML-based network optimization [2]. Studies regard that a network digital twin is a key enabler for efficient development, operation, management, and optimization of modern real-world wireless networks [3], [4].

Radio access networks3 are inherently dynamic due to the high mobility of users. Handovers, changes in wireless channels, resource allocation and changes in interference render a significantly dynamic behavior that affects quality of service (QoS) such as user downlink rates at the ms-level. The radio access network model therefore needs to be sufficiently accurate and efficient to synchronize model to what is occurring in the radio access network. Current discrete-event network simulators [5], [6] are computationally expensive and therefore unsuitable for real-time closed-loop network optimization. Theoretical models such as the load-coupled model [7] have been proposed as mathematical abstractions at the network level, but at the expense of making assumptions that simplify the behavior of the network making them innacurate. Typical neural network architectures (e.g., fully-connected) with an arbitrary (high) number of input data samples and model parameters can approximate the behavior of the radio access network in a fixed scenario due to the universal approximation theorem [8]. Apart from the practical limitations to collect vast amounts of data samples (e.g., costly measurement campaigns), out-of-distribution data shifts due to changes in the radio access network may cause severe performance degradation and increase the time the network model and the network are de-synchronized [9], [10]. To reduce the de-synchronization time, the radio access network model can generalize to unseen scenarios and decrease performance degradation to avoid model updates. Decreasing the number of model updates reduces the de-synchronization time and requires significantly less time than continual learning methods which may require seconds to adapt to a new distribution [10].

Several AI/ML network models with different modeling scopes (e.g., wired networks and ad-hoc wireless networks) related to radio access networks have been proposed in the literature [11], [12], [13], [14]. To overcome de-synchronization and complexity limitations, these works employ the graph neural network (GNN) model [15]. The computation graph of a GNN model is given by a predefined input graph topology. This characteristic enables more efficient and accurate inference of network behavior as well as generalization capabilities to larger problem scales. The proposed AI/ML network models in [11] and [13] represent computer networks as homogeneous graphs4 of connected queues. These graphs are given as input samples to a GNN model to efficiently estimate quality of service (QoS) metrics in wired networks. The GNN model has achieved remarkable performance and reduced complexity by exploiting structure in wireless optimization problems such as radio resource management [16] and power control [17], among others.

Closest to the radio access network modeling scope, the network models in [12] and [14] are designed for ad-hoc wireless networks by modeling interference as edges with distance-based weights for links that interfere each other. Efficient evaluation of ad-hoc wireless networks in a rectangular-grid is achieved with the proposed network model. However, the proposed model design has a different modeling scope and cannot be directly applied to model radio access networks. Firstly, it does not have enough modeling flexibility to account for multiple types of network nodes (e.g., base stations (BS) or user equipment (UE)) which have different functionalities, capabilities and local information. Secondly, due to the use of distance-based edge weights, it cannot account for realistic wireless propagation environments, where the possible presence of obstructions (e.g., buildings) render non-line-of-sight (NLOS) channel conditions between network nodes.5 Other works focus on specific parts of the radio access network. For example, authors in [18], focus on network traffic prediction at BS level using graph convolutional networks. The work in [19] models traffic flows across backhaul networks (edge networks) using graph attention. The work in [20], learns cellular traffic and channel generation distributions using generative models. There is a lack of AI/ML network models that consider all network nodes in radio access networks within their modeling scope. A comparative table summarizing the most related prior art and its limitations is shown in Table 1.

TABLE 1 State of the Art AI/ML Network Models Related to Radio Access Networks
Table 1- State of the Art AI/ML Network Models Related to Radio Access Networks

In this work, we present WirelessNet, a novel radio access network model based on Heterogeneous Message Passing Graph Neural Networks (HMPGNNs) [15], [21], along with two network applications served by our proposed model. HMPGNNs are a family of GNNs that operate with heterogeneous graphs, i.e., graphs with multiple types of edges and nodes, where nodes compute messages with learnable parameters and exchange them with their neighboring nodes. We propose to represent complex interactions in radio access networks as heterogeneous graphs given as input samples to HMPGNNs. We represent the input samples of our radio access network model as heterogeneous graphs with two types of edges: communication and inter-cell interference, together with two types of nodes: user equipment (UEs) and base stations (BSs). As network nodes in radio access networks have different type of local information (e.g., position information for UEs and resource block (RB) utilization for BSs) and functionalities, we represent radio access networks with multiple node types. Similarly, since network nodes in radio access networks interact between each other in different ways (e.g., BSs communicate in downlink with UEs while interferring other UEs connected to neighboring BSs), we represent different wireless interactions between network nodes in radio access networks with different edge types.

The graph topology of the input heterogeneous graphs is constructed based on wireless measurements performed by network nodes (i.e., received power from transmitted reference signals from BS within cellular network simulations6). The input heterogeneous graph defines the structure of the computation graph of WirelessNet. These allows to synchronize the computation graph of the radio access network model with the structure of the wireless phenomena in the radio access network. Dynamics in radio access networks such as handovers, changes in wireless channels and cell-load fluctuations are reflected in the radio access network model with changes in graph topology per edge type (i.e., computation graph) and in input feature information attached to the network nodes as well as edge weights. The downlink communication and inter-cell interference edges in the input heterogeneous graph indicate computation with different independent trainable parameters per edge type in the learning model. These parameters are shared per edge type thus allowing to perform computations at different nodes emerging from the same wireless phenomena with the same functions.

We evaluate the accuracy of our HMPGNN model with datasets of realistic cellular network simulations generated using ns-3 [5], a well-known discrete-event network simulator. WirelessNet achieves high accuracy for downlink data rate reconstruction for in-distribution estimates, while it also generalizes to unseen network deployments. To illustrate the generalization capabilities of WirelessNet, after been trained with 6-BS, 8-BS, 12-BS network deployments, the model is able to reconstruct downlink rate for all users in an unseen 10-BS network deployment with a 24% Mean Absolute Percentage Error (MAPE). Moreover, the runtime cost to replicate multiple seconds of network behavior with WirelessNet is extremely low at the ms level, whereas ns-3 takes several orders of magnitude longer. Results show that WirelessNet significantly outperforms unstructured fully-connected deep neural networks (FCDNN).

We extend upon a preliminary version of this work [22], by performing ablation experiments to understand which components of WirelessNet influence most significantly the performance. For the downlink data rate reconstruction task, the downlink signal-to-interference-plus-noise (SINR) ratio as UE node input feature within our proposed input heterogeneous graph is the most significant contributor to generalize to unseen network deployments. Similar to the findings reported in [23], where models trained with SINR can generalize to geographical locations, we report that models trained with SINR generalize to unseen network deployments. In a more practical setting without SINR and with reference signal received power (RSRP) from serving BS as UE node feature, WirelessNet is able to generalize to unseen network deployments aided by inter-cell interference edges that induce the interference phenomena within its model architecture with a reduced performance compared when having SINR. Beyond increased modeling flexibility of the radio access network, results show that WirelessNet, with the use of heterogeneous graphs and shared model parameters per wireless phenomena, significantly outperforms homogeneous GNNs.

Finally, we present a framework to generate vector representations per network node with our proposed WirelessNet model for multiple network applications. Vector representations are low-dimensional, learned continuous real-valued numbers that represent an abstract state of network nodes in the radio access network. Vector representations capture local structural information per edge type together with local and neighborhood feature information. These vector representations can be provided to downstream7 network applications such as network optimization, planning and prediction tasks. A single vector representation can be used by multiple network applications, which facilitates the scalability and integration of our proposal in radio access networks. We showcase two network applications served by WirelessNet, namely:

  • Radio Access Network Deployment Planning: we evaluate the network performance of different network deployments for a particular load pattern in our proposed network model and select the one that fulfills QoS requirements in terms of minimum data rate per user and system throughput based on a constrained budget. Considering ns-3 simulations as the ground truth, WirelessNet closely resembles the actual radio access network performance with very low complexity.

  • AI/ML Model Training for QoS Prediction: we utilize the vector representations generated by WirelessNet as synthetic data to train QoS prediction models to predict downlink rate of all users with transfer learning. QoS prediction models trained with vector representations generated from WirelessNet significantly outperform AI/ML models trained with unstructured raw input information from cellular network simulations.

The paper is organized as follows: Section II provides an overview of the related work to build network models related to radio access networks and network applications served by said network models. Section IV describes the system architecture. Section V describes the WirelessNet model and the proposed modeling framework. Section VI evaluates the accuracy of our proposed WirelessNet model for in-distribution and out-of-distribution estimates. Section VII presents the network applications served by the WirelessNet model. Section VIII concludes the work.

Notations: We let a , \boldsymbol {a} and \boldsymbol {A} denote a scalar, a column-vector and a matrix respectively. The \boldsymbol {A}_{(j,i)} denotes the j-th row and the i -th column of matrix \boldsymbol {A} . With (\cdot ,\cdot) , we denote a tuple of edge indices where the first element denotes the source node and the second element denotes the target node. The operator [\cdot] denotes a row vector and the operator \{\cdot \} denotes a set. The superscript (\cdot)^{\top } denotes the transpose of a vector. The operator \vert \vert \cdot \vert \vert _{1} denotes \ell _{1} norm. The operator vec(\cdot) denotes the flatten operator applied to a matrix. Finally, let {\mathcal {N}}_{j} denote the neighbors in the graph of node j , refered to as its neighborhood therein, such that {\mathcal {N}}_{j} = \{j:(i,j)\in \mathcal {E}\} where \mathcal {E} denotes the set of all edges in a graph.

SECTION II.

Related Work

The related work can be categorized into two main directions: methods to build radio access network-related models and the usage of these models to serve network applications.

A. Methods to Build Network Models Related to Radio Access Networks

The modeling scope of this work is to comprehensively model radio access networks in order to efficiently evaluate QoS performance given different network configurations (e.g., resource allocation, deployment, channels, etc…). Discrete-event network simulators (e.g., ns-3 [5], OMNeT++ [6]) have been the most commonly used modeling tool for this purpose. However, network simulators are unsuitable for closed-loop operation due to their computational complexity. Further mathematical frameworks [7], [24] rely on over-simplified assumptions of real-world deployments, yet, these frameworks can complement existing models for optimization and planning of cellular networks. Current AI/ML network models related to radio access networks are summarized in Table 1. Closest to our work, authors from [14] proposed a wired and ad-hoc wireless network model adapted from a GNN model design for computer networks. Due to the hierarchical and heterogeneous nature of radio access networks, their proposed model is not flexible enough to scale to new radio access network nodes and functionalities. Among other works that focus on specific components of the radio access network described in the previous section are [25], [26]. In particular, the authors from [25] propose a random access model built with bayesian learning techniques and probabilistic graphical models to learn a joint probability distribution of the packet generation process and the wireless channel. Similarly, authors in [26] proposed network slices model deployed on physical infrastructure to evaluate their end-to-end latency using GNN models.

A radio access network model can be also complemented with efficient radio map models [27], [28], [29], [30], [31] that account for environment-dependent wireless propagation effects. The works in [27] and [28] propose efficient geometry-based wireless channel simulators using outlines of buildings, vehicles and foliage. The authors in [29] estimate the propagation pathloss in a realistic propagation environment with convolutional neural networks. Novel efficient ray-tracing techniques have been recently proposed based on differentiable neural network-based ray tracing [31], [32].

B. Network Applications Served by Network Models

Network models residing in network digital twins provide the modeling component to reliably and efficiently assess what-if scenarios without affecting radio access networks in production and facilitate innovative optimization solutions. In the scenario presented in [33], a natural disaster damages the deployed radio access network. An agent is trained using deep reinforcement learning (DRL) with a network model to optimize the trajectory of mobile aerial base stations for fast network formation and serve users while minimizing transmitted power, and thus increasing flying time. In the network application presented in [34], a network model is used to select the antenna parameter configuration for a massive multiple-input-multiple-output (MIMO) antenna with expert knowledge and reinforcement learning.

In a mobile edge computing (MEC) system, a network model of the communication system is used to evaluate user association schemes in terms of energy consumption and delay, and best configurations are saved in memory as a labeled training sample to train a deep neural network (DNN) to output resource allocation and offloading probabilities [35]. Similarly, a network model is used to aid task offloading of resource constrained UEs with authentication services to reduce network delay and reduce power consumption [36]. A network model of a MEC system is used to serve a task offloading and service caching network application [37]. The backhaul network model in [19] is used for network anomaly prediction and serve a self-healing network application. The random access model in [25], serves a random access policy optimization network application based on multi-agent reinforcement learning (MARL) with uncertainty-aware metrics. The network model proposed in [14] is used to optimized the traffic load of wireless ad-hoc networks. In computer networking, budget-constrained networks are upgraded and QoS-aware routing scheme are found with network models [11].

In contrast to our work, some studies [33], [35] [36] assume the network model of the radio access network already exists or focus on a specific aspect of radio access networks. Finally, none of the works consider vector representations as outputs from network models to be given to AI/ML model training network applications in radio access networks.

SECTION III.

Background and Preliminaries

We describe the radio access network to be modeled by WirelessNet in Section III-A and we review concepts of GNNs and HMPGNNs necessary to understand the rest of the paper in Section III-B.

A. Radio Access Network

Consider a Single-Input-Single-Output (SISO) downlink cellular network with system bandwidth K \in \mathbb {N} where the BSs employ orthogonal frequency division multiple access (OFDMA) to allocate its resources. The set of UEs and BSs in the cellular network is given by {\mathcal {V}}_{\text {BS}} and {\mathcal {V}}_{\text {UE}} respectively, where i \in {\mathcal {V}}_{\text {UE}} and j \in {\mathcal {V}}_{\text {BS}} . Consider the sets of resource blocks (RBs) per BS as {\mathcal {K}}_{j} = \{{1,\ldots ,\vert {\mathcal {K}}_{j}\vert }\}\: \forall j \in {\mathcal {V}}_{\text {BS}} . These resources are distributed with a proportional fairness scheduling policy. Users experience time-varying channels with a mix of line-of-sight (LOS) and non-line-of-sight (NLOS) propagation paths due to scatterers and obstructions along their trajectories. Neighboring BSs induce inter-cell interference as there is no resource coordination between them and the frequency reuse factor of the cellular network is 1. Let G_{j,i} denote the channel gain between BS j and UE i at the k_{j}^{\text {th}} RB, and P_{j} denote the downlink transmit power of BS j which is considered to be constant \forall k_{j} \in {\mathcal {K}}_{j} . The received downlink signal-to-interference-plus-noise (SINR) ratio in the k_{j}^{\text {th}} RB used by UE i from BS j is given by\begin{equation*} \gamma ^{k}_{ji} = \frac {P_{j} G^{k}_{ji}}{\sum _{j' \in {\mathcal {V}}_{\text {BS}}\setminus j} \beta ^{k}_{j'}P_{j'} G^{k}_{j'i} + \sigma _{o}^{2}}. \tag {1}\end{equation*} View SourceRight-click on figure for MathML and additional features. \beta ^{k}_{j'} \in \{0,1\} denotes if BS j' is scheduled to transmit or not at a given time slot in the k_{j}^{th} RB, and \sigma _{o}^{2} denotes the variance of the power spectral density of white Gaussian thermal noise. Depending on the quality of the wireless channel reported by the UEs, BSs perform link adaptation by assigning a specific modulation and coding scheme (MCS) to the transport block (TB).8 The size of the transport block is determined by the number of allocated RBs and by the MCS assigned by the BS as per 3GPP [39]. The MCS indicates the coding rate of the transmitted data bits and the employed modulation (i.e., number of bits carried over each data symbol). Depending on the wireless channel between the BS and the UE as well as the employed MCS, the TB will suffer errors on the coded data blocks. The block error rate is the ratio between the erroneous received blocks and the total number of blocks transmitted. We refer the interested reader to [40] for more details.

B. Characteristics of HMPGNNs

GNNs are a general neural network architecture that operates on graph-structured data. A key characteristic about a GNN model is that its computation graph is defined by the graph topology of the input graph fed to the model [41]. The graph is comprised by its nodes and the edges between them, together with the feature information attached to them. Let \boldsymbol {X} \in \mathbb {R}^{\vert \mathcal {V}\vert \times n} denote the node feature matrix where \mathcal {V} = {\mathcal {V}}_{\text {UE}}\bigcup {\mathcal {V}}_{\text {BS}} and n \in \mathbb {N} is the total number of input features. Let \boldsymbol {A} \in \{0,1\}^{\vert \mathcal {V} \vert \times \vert \mathcal {V} \vert } denote the adjacency matrix that defines the edges between the nodes in the graph. Mathematically, a layer of a GNN is a permutation-equivariant function f'(\boldsymbol {X}, \boldsymbol {A}) over graphs by application of a shared local permutation-invariant function g' for the neighborhood of all nodes in the graph [42]. A class of methods, referred to as Message Passing [43], defines g' as a function that computes arbitrary vectors, referred to as messages, to be sent across edges. Within the context of GNNs, g' is parameterized with trainable weights and thus the specific way the messages are propagated to the nodes is learnable.

A GNN is composed by three functional operators: namely, the message g_{M} , the aggregation g_{A} and the update g_{U} functional operators. The parameters of these functions are shared by all the nodes belonging to the same type. During a forward-pass, the following steps occur: i) nodes with outbound edges compute messages with message function g_{M} ; ii) nodes communicate the computed messages to their neighbors; iii) nodes with inbound edges aggregate messages from neighbors with function g_{A} and update their own representation with function g_{U} . In a HMPGNN, there are multiple independent functions for message g_{M}^{\tau } , aggregation g_{A}^{\tau } and update g_{U}^{\tau } functions with independent trainable parameters per edge type \tau \in \mathcal {R} . Therefore, during the forward-pass to compute the vector representation of a particular network node, the computation of the message, aggregation and update functions are conditioned on the edge types of its inbound and outbound edges. Section V-B further elaborates on these characteristics.

SECTION IV.

System Architecture

A radio access network model that enables the network digital twin paradigm for network management, should also be scalable to multiple requests from different network applications in the radio access network. Towards deploying a radio access network model in existing or new network functions in the radio access network (e.g., BS), the model should expose a common output to multiple network applications. Therefore, the system architecture, extended from the reference architecture proposed in [3], is composed by three key system components: the physical radio access network, the radio access network model and network applications. Fig. 1 illustrates the system architecture and information flow. First, 1) the physical data from the radio access network is given to WirelessNet. Using the physical data, 2) WirelessNet efficiently computes vector representations together with accurate network outputs (e.g., QoS metrics of users) and remains synchronized to what is occurring in the radio access network. 3) Network applications send modeling service requests (e.g., verify network performance of several what-if scenarios) to WirelessNet. 4) Upon receiving requests, WirelessNet provides vector representations and/or network outputs to network applications. 5) Network applications consume said outputs and compute updated network configurations or future analytics. 6) The network configurations or future analytics are communicated to the physical radio access network. The physical radio access network updates its network configuration and optimizes its network performance. The goal of the radio access network model is to generate accurate and real-time virtual representations for all network nodes of the radio access network in the form of vector representations. Network outputs are computed from the vector representations and are given as input to network applications.

FIGURE 1. - System Architecture extended from [3]. WirelessNet efficiently transforms physical data into real-time vector representations and network outputs useful for N network applications. In this work, we showcase two network applications served by WirelessNet, namely: Network Deployment Planning and AI/ML model training.
FIGURE 1.

System Architecture extended from [3]. WirelessNet efficiently transforms physical data into real-time vector representations and network outputs useful for N network applications. In this work, we showcase two network applications served by WirelessNet, namely: Network Deployment Planning and AI/ML model training.

Network applications refer to various applications including but not limited to network optimization, network prediction, performance in what-if wireless scenario, network planning and AI/ML model training. The output of said network applications can produce future analytics outputs (e.g., QoS prediction) or optimized network configuration outputs (e.g., resource allocation). Since WirelessNet is an accurate and efficient network model of the physical radio access network, network applications can use its outputs to optimize and verify network performance in what-if scenarios or acquire representative data efficiently before updating network configuration in the physical radio access network. These network applications may be implemented for example by OAM servers in the core network or in neighboring BSs in the radio access network. WirelessNet can be deployed in a BS of the radio access network, a network data analytics function (NWDAF)9 located in the core network or a mobile edge computing (MEC) server located logically outside a 3GPP mobile network. We elaborate on deployment considerations in Sec. VII-C.

A. Problem Formulation

The goal is to learn a functional mapping between different network configurations of the radio access network to vector representations and QoS of users. The node feature matrix \boldsymbol {X} represents different network configurations of the radio access network (e.g., locations, wireless channels, resource allocation, etc…) of all network nodes. The adjacency matrix \boldsymbol {A} represents the structure of the wireless phenomena occuring between BSs and UEs as edges among nodes in the graph. These vector representations and QoS are provided to multiple different network applications. Consider the ground truth QoS of wireless users as \boldsymbol {Q}_{\text {UE}}:= [\boldsymbol {q}_{1},\ldots , \boldsymbol {q}_{\vert {\mathcal {V}}_{\text {UE}}\vert }]^{\top } \in \mathbb {R}^{\vert {\mathcal {V}}_{\text {UE}}\vert \times n'} where \boldsymbol {q}_{i}:= [q_{i,1},\ldots ,q_{i,n'}]^{\top }\in \mathbb {R}^{n'} . Variable n' \in \mathbb {N} denotes the number of QoS metrics per wireless user (e.g., downlink rate, downlink delay, etc…). Let f_{\boldsymbol {\Theta }}: \mathbb {R}^{\vert \mathcal {V}\vert \times n}\times \{0,1\}^{\vert \mathcal {V} \vert \times \vert \mathcal {V} \vert } \rightarrow \mathbb {R}^{d\times \vert {\mathcal {V}}_{\text {UE}}\vert } denote the functional mapping between network configurations and structure of wireless phenomena to the stacked vector representation of all users as \boldsymbol {H}:= [\boldsymbol {h}_{1},\ldots ,\boldsymbol {h}_{\vert {\mathcal {V}}_{\text {UE}} \vert }] \in \mathbb {R}^{d\times \vert {\mathcal {V}}_{\text {UE}}\vert } . \boldsymbol {\Theta } is the set of model parameters that parametrize function f_{\boldsymbol {\Theta }} . Variable d \in \mathbb {N} denotes number of dimensions of the vector representation and \boldsymbol {h}_{i}:= [h_{i,1},\ldots ,h_{i,d}]^{\top } \in \mathbb {R}^{d} denotes the vector representation per user. Let g_{\Phi }: \mathbb {R}^{d\times \vert {\mathcal {V}}_{\text {UE}}\vert } \rightarrow \mathbb {R}^{\vert {\mathcal {V}}_{\text {UE}}\vert \times n'} denote the functional mapping between vector representations and reconstructed QoS metrics \hat {\boldsymbol {Q}}_{\text {UE}}:= [\hat {\boldsymbol {q}}_{1}, \ldots , \hat {\boldsymbol {q}}_{\vert {\mathcal {V}}_{\text {UE}} \vert }]^{\top } \in \mathbb {R}^{\vert {\mathcal {V}}_{\text {UE}}\vert \times n'} where \hat {\boldsymbol {q}}_{i}:= [\hat {q}_{i,1},\ldots ,\hat {q}_{i,n'}]^{\top }\in \mathbb {R}^{n'} . Variable n' denotes the number of reconstructed QoS metrics. \Phi is the set of model parameters that parametrize function g_{\Phi } and this function is known as the output function. We define the objective as finding the set of model parameters \boldsymbol {\Theta } and \Phi that minimize the cost function as\begin{equation*} \min _{ \boldsymbol {\Theta }, \Phi }\sum _{i'=1}^{m} \mathcal {L}(g_{\Phi }(f_{\boldsymbol {\Theta }}(\boldsymbol {X}_{i'}, \boldsymbol {A}_{i'})), \boldsymbol {Q}_{\text {UE},i'}). \tag {2}\end{equation*} View SourceRight-click on figure for MathML and additional features.Let m \in \mathbb {N} denote the number of training samples and let \mathcal {L} denote the loss per sample between the output of the radio access network model and the QoS \boldsymbol {Q}_{UE,i'} . Note that f_{\boldsymbol {\Theta }}(\boldsymbol {X}_{i'}, \boldsymbol {A}_{i'}) outputs vector representations \boldsymbol {H}_{i'} and g_{\Phi }(\boldsymbol {H}_{i'}) outputs reconstructed QoS metrics \hat {\boldsymbol {Q}}_{\text {UE},i'} . The loss function \mathcal {L} is given by the mean absolute error (MAE) as\begin{equation*} \vert \vert \hat {\boldsymbol {Q}}_{\text {UE},i'} - \boldsymbol {Q}_{\text {UE},i'} \vert \vert _{1}. \tag {3}\end{equation*} View SourceRight-click on figure for MathML and additional features.We can efficiently compute the gradient of the loss function with respect to the model parameters applying the backpropagation algorithm and jointly find model parameters \boldsymbol {\Theta } and \Phi that minimize loss function \mathcal {L} . However, both model parameter sets can be found in separate and independent training procedures.

SECTION V.

WirelessNet Model Architecture

Section V-A describes the proposed data structure of the input samples to our proposed radio access network model. Section V-B describes the model design and heterogeneous computation graph of WirelessNet.

A. Input Data of Wirelessnet as Heterogeneous Graphs

The input data structure of our proposed radio access network model are heterogeneous graphs. These heterogeneous graphs are given as input samples to the HMPGNN model. Consider a set of node types \mathcal {T} and a set of edge types \mathcal {R} . The heterogeneous graph is then defined as \mathcal {G}(\mathcal {V}, \mathcal {E}, \mathcal {R}, \mathcal {T}) where \mathcal {V} defines the set of all nodes and \mathcal {E} defines the set of all edges. Heterogeneous graphs are suitable to represent the structure of the radio access network based on the following two intuitions:

  • Firstly, there are multiple types of network entities in a radio access network with different functionalities, capabilities and local information such as UEs, BS, satellites, etc…The local information available at the network nodes as well as its characteristics are expressed as node features or edge weights. The different types of radio access network nodes are represented within the graph by nodes of certain type, namely UEs and BSs.

  • Secondly, network nodes may wirelessly interact with each other in different forms. These interactions between them have different characteristics, and therefore they will have different effects on UE performance. For example, the interaction between a UE and its serving BS is different than between that said UE and an interfering BS. This intuition is encoded in our heterogeneous graphs by modeling these interactions with different edge types.

Fig. 2 depicts the state of the physical radio access network represented as a heterogeneous graph. WirelessNet outputs a vector representation per UE according to the structure of the input heterogeneous graph. We consider two edge types, namely communication and interference. The set of communication edges in graph \mathcal {G} is given by {\mathcal {E}}_{c} \in \{1,\ldots , \vert \mathcal {V}\vert \}^{2\times \vert {\mathcal {E}}_{c} \vert } in coordinate format (COO), where the tuple e_{c}:=(j,i) \in {\mathcal {E}}_{c} denotes a specific directed communication edge whose first element is the index of the source node and the second element is the index of the target node. Similarly, the set of interference edges in COO format is given as {\mathcal {E}}_{I} \in \{1,\ldots , \vert \mathcal {V}\vert \}^{2\times \vert {\mathcal {E}}_{I} \vert } , where the tuple e_{I}:=(j,i) \in {\mathcal {E}}_{I} denotes a specific directed interference edge for downlink. The set of all edges in the heterogeneous graph is given by \mathcal {E} = {\mathcal {E}}_{c}\bigcup {\mathcal {E}}_{I} , where {\mathcal {E}}_{c}\bigcap {\mathcal {E}}_{I} = \emptyset .

FIGURE 2. - Radio access network nodes and interactions between them represented as a heterogeneous graph. A network vector representation is generated per UE based on their local graph topology per edge type together with their local and neighborhood information.
FIGURE 2.

Radio access network nodes and interactions between them represented as a heterogeneous graph. A network vector representation is generated per UE based on their local graph topology per edge type together with their local and neighborhood information.

B. Heterogeneous Computation Graph

The main design principle of our proposed model is to simulate the wireless interactions in the network with a learning model. The graph topology of the input heterogeneous graphs defines the structure of the computation graph of the HMPGNN model. Different independent trainable parameters are used per edge type depending if the BS communicates or interferes with the UE. To illustrate, Fig. 3 shows the heterogeneous computation graph of WirelessNet to reconstruct the QoS metric of a single UE where UE 1 is attached to BS 1 and it is interfered by BS 2 and BS 3. As parameters are shared per edge type (i.e., wireless phenomena), the computations at different nodes that emerge from the same wireless phenomena are performed with the same functions (e.g., interference message function from BS 2 and 3 in Fig. 3).

FIGURE 3. - Example of an input heterogeneous graph and the computation graph it induces on the WirelessNet model to compute QoS metric for UE 1 (attached to BS 1 and interfered by BS 2 and 3).
FIGURE 3.

Example of an input heterogeneous graph and the computation graph it induces on the WirelessNet model to compute QoS metric for UE 1 (attached to BS 1 and interfered by BS 2 and 3).

In the following, we describe the heterogeneous computation graph in detail. Let \mathcal {N}^{\tau }_{i} denote the BS neighbors of user i \in {\mathcal {V}}_{\text {UE}} with edge type \tau \in \mathcal {R} . The set of BS neighbors is dependent on the local graph topology around user i . The graph topology per edge type is dependent on measurements of reference signals performed by the UEs (described in Section VI-B). Let d \in \mathbb {N} denotes the number of hidden units for all edge types. Consider the computed message \boldsymbol {m}^{\tau }_{j}:= [m_{1}^{\tau },\ldots ,m_{d}^{\tau }]^{\top } \in \mathbb {R}^{d} as the output of the message function per BS j and per interaction \tau as g^{\tau }_{M_{j}}: \mathbb {R}^{n_{\text {BS}}} \rightarrow \mathbb {R}^{d} , where n_{\text {BS}} \in \mathbb {N} denotes the number of BS node features. The BS node features are given by \boldsymbol {x}_{\text {BS},j} \in \mathbb {R}^{1\times n_{\text {BS}}} \: \forall j \in {\mathcal {V}}_{\text {BS}} . The message each neighboring BS node j \in {\mathcal {N}}_{i}^{\tau } computes is conditioned by their edge type \tau and it is given by\begin{align*} \boldsymbol {m}^{\tau }_{j} & = x_{\tau , e_{\tau }} g^{\tau }_{M_{j}}(\boldsymbol {x}_{\text {BS},j}) \\ & = x_{\tau , e_{\tau }} \boldsymbol {W}^{\tau } \text {ReLU}(\boldsymbol {W}_{\text {pool}}^{\tau } \boldsymbol {x}_{\text {BS},j} + \boldsymbol {b}_{\text {pool}}^{\tau }). \tag {4}\end{align*} View SourceRight-click on figure for MathML and additional features.The set of parameters \boldsymbol {\Theta }_{M} \in \{\boldsymbol {W}^{\tau }, \boldsymbol {W}^{\tau }_{\text {pool}}, \boldsymbol {b}_{\text {pool}}^{\tau } \}_{\tau \in \mathcal {R}} parametrize the message function g^{\tau }_{M}(\cdot) and e_{\tau } denotes a tuple either for communication or interference (subindex c and I, respectively).

The \boldsymbol {W}^{\tau }_{\text {pool}} and \boldsymbol {b}^{\tau }_{\text {pool}} parameters together with the element-wise non-linear activation function \text {ReLU}(\cdot):= \max (0,\cdot) transform BS node features into a different feature space. Non-linearities such as the ReLU function increases the expressivity of deep learning models.10 The \boldsymbol {W}^{\tau } weight matrix performs a linear transformation and the message is thereafter passed to the target UE nodes. Note, that these message computations can be done in parallel by the BS nodes. When the message is passed, it is multiplied with the corresponding edge weight x_{\tau , e_{\tau }}\in \mathbb {R} . Thereafter, messages are aggregated per edge type \tau with the aggregation function g_{A}^{\tau }: \mathbb {R}^{d\times \vert {\mathcal {N}}_{i}^{\tau }\vert } \rightarrow \mathbb {R}^{d} to compute the aggregated message per edge type \boldsymbol {a}^{\tau }:= [a^{\tau }_{1},\ldots ,a^{\tau }_{d}]^{\top } \in \mathbb {R}^{d} as\begin{equation*} \boldsymbol {a}^{\tau } = g_{A}^{\tau }([\boldsymbol {m}_{1}^{\tau },\ldots , \boldsymbol {m}_{\vert \mathcal {N}^{\tau }_{i}\vert }^{\tau }]) = \max ([\boldsymbol {m}_{1}^{\tau },\ldots , \boldsymbol {m}_{\vert \mathcal {N}^{\tau }_{i}\vert }^{\tau }]). \tag {5}\end{equation*} View SourceRight-click on figure for MathML and additional features.The \max (\cdot) function denotes an element-wise max pooling operator which aggregates feature information across the neighbor set per edge type. Once these messages of the same edge type are aggregated, they are further aggregated into \boldsymbol {a}:= [a_{1},\ldots ,a_{d}]^{\top } \in \mathbb {R}^{d} as\begin{equation*} \boldsymbol {a} =\sum _{\tau \in \mathcal {R}}\boldsymbol {a}^{\tau }. \tag {6}\end{equation*} View SourceRight-click on figure for MathML and additional features.Such aggregation and message functions enable a trainable aggregator which captures different aspects of the neighborhood set with the element-wise max pooling operator, inspired from the GraphSAGE framework [38]. Meanwhile, local information from UE nodes are transformed with the update function per edge type g_{U}^{\tau }: \mathbb {R}^{n_{\text {UE}}} \rightarrow \mathbb {R}^{d} as\begin{equation*} \boldsymbol {u}^{\tau } = g^{\tau }_{U}(\boldsymbol {x}_{\text {UE},i}) = \boldsymbol {W}^{\tau }_{\text {UE}}\boldsymbol {x}_{\text {UE},i}. \tag {7}\end{equation*} View SourceRight-click on figure for MathML and additional features.The column-vector \boldsymbol {u}^{\tau }:= [u^{\tau }_{1},\ldots ,u^{\tau }_{d}]^{\top } \in \mathbb {R}^{d} and \boldsymbol {x}_{\text {UE},i} \in \mathbb {R}^{n_{\text {UE}}} denotes UE node features where n_{\text {UE}} \in \mathbb {N} is the number of UE features. Thereafter, a representation is computed for UE node i with update function g_{U}: \mathbb {R}^{d} \rightarrow \mathbb {R}^{d} using the aggregated representation of neighbors and aggregated local feature representation as\begin{equation*} \boldsymbol {h}_{i} = g_{U}\left ({{\sum _{\tau \in \mathcal {R}}\boldsymbol {u}^{\tau } + \boldsymbol {a}}}\right ) = \text {ReLU}\left ({{\sum _{\tau \in \mathcal {R}}\boldsymbol {u}^{\tau } + \boldsymbol {a}}}\right ). \tag {8}\end{equation*} View SourceRight-click on figure for MathML and additional features.The update function is parametrized by \boldsymbol {\Theta }_{U} \in \{\boldsymbol {W}^{\tau }_{\text {UE}}\}_{\tau \in \mathcal {R}} . Thereafter, the UE node representation is normalized with batch normalization (referred to as BatchNorm). Finally, the output function g_{\Phi }:\mathbb {R}^{d} \rightarrow \mathbb {R} transforms the normalized UE node representation \boldsymbol {h}_{i} into the QoS metric \hat {q}_{\text {UE},i} . This function is given by\begin{equation*} \hat {q}_{\text {UE},i} = g_{\Phi }(\boldsymbol {h}_{i}) = \boldsymbol {w}_{\text {UE},\Phi }\boldsymbol {h}_{i} + \boldsymbol {b}_{\text {UE},\Phi }. \tag {9}\end{equation*} View SourceRight-click on figure for MathML and additional features.Function g_{\Phi }(\boldsymbol {h}_{i}) is parametrized by the set of parameters \boldsymbol {\Phi } \in \{\boldsymbol {w}_{\text {UE},\Phi }, \boldsymbol {b}_{\text {UE},\Phi } \} . Algorithm 1 describes the forward-pass of every user i \in {\mathcal {V}}_{\text {UE}} with an arbitrary number of HMPGNN layers l \in \mathbb {N} . The vector representation computation of the other UEs will depend on the heterogeneous computation graph of their local graph structure per edge type. Matrix \boldsymbol {H}^{l}:= [\boldsymbol {h}^{l}_{1},\ldots ,\boldsymbol {h}^{l}_{\vert {\mathcal {V}}_{\text {UE}} \vert }] \in \mathbb {R}^{d\times \vert {\mathcal {V}}_{\text {UE}}\vert } denotes the stacked vector representation of all users after l layers.

Algorithm 1 - Vector Representation Generation in WirelessNet and QoS Reconstruction:
Algorithm 1

Vector Representation Generation in WirelessNet and QoS Reconstruction:

C. Computational Complexity Analysis

The computational complexity of WirelessNet is approximately \mathcal {O}(l\vert {\mathcal {V}}_{\text {UE}}\vert d(\vert {\mathcal {V}}_{\text {BS}}\vert n^{2}_{\text {BS}}+\vert {\mathcal {V}}_{\text {BS}}\vert n_{\text {BS}} d+2 n_{\text {BS}}) + l\vert {\mathcal {V}}_{\text {UE}}\vert d \overline {N}_{\text {I}} (\vert {\mathcal {V}}_{\text {BS}}\vert n^{2}_{\text {BS}}+\vert {\mathcal {V}}_{\text {BS}}\vert n_{\text {BS}}d + 2~n_{\text {BS}}) + 2~l d(\vert {\mathcal {V}}_{\text {UE}}\vert n_{\text {UE}} + 1) + (d\vert {\mathcal {V}}_{\text {UE}}\vert n' + 1)) in the forward-pass. Users are target nodes and BSs are source nodes, where \overline {N}_{\text {I}} is the average number of interferers per user. The first term in parenthesis corresponds to the message and aggregation computation for the communication edges. The second term in parenthesis corresponds to the interference edges. The third term in parenthesis corresponds to the communication and interference update functions. The fourth term in parenthesis corresponds to the output function and batch normalization. The complexity is dominated by matrix computations in the message and aggregation functions. In the typical scenario where there are more users than BSs11 (e.g., a cellular network), the complexity is approximately \mathcal {O}(\vert {\mathcal {V}}_{\text {UE}}\vert) . In the worst case scenario, where there are more BSs than users (e.g., an ultra-dense cellular network) and users are interfered by approximately all BSs (e.g., \overline {N}_{\text {I}}\approx \vert {\mathcal {V}}_{\text {BS}}\vert ), an upper bound estimate of the computational complexity is \mathcal {O}(\vert {\mathcal {V}}_{\text {BS}}\vert ^{2}) .

SECTION VI.

Evaluation of Accuracy of WirelessNet

A. Radio Access Network Simulation Setup

The downlink cellular network is simulated using a custom-based discrete-event end-to-end network simulator, ns-3 [5]. Simulations are performed to collect ground truth measurements of the network nodes (i.e, BSs and UEs) every \Delta t , where \Delta t = 200 ms. Unless stated otherwise, feature information whose logging granularity is faster than \Delta t is postprocessed with the latest log. We focus on downlink rate reconstruction of all users. Downlink rates for each individual UE in the radio access network are calculated in [Mbits/\Delta t ] at the Packet Data Convergence Protocol (PDCP) layer. Consider each UE moves along a road segment surrounding a building from the 3GPP urban road topology [47]. We model mobility of the UEs with the microscropic traffic simulator SUMO [46]. BSs are deployed as lamp sites along the roads surrounding the building. We use the 3GPP Urban Micro (UMi) channel model [48] to account for path loss, autocorrelation of shadowing and fast fading effects. LOS and NLOS channel conditions depend on the geometry of the urban environment and the radio access network deployment. Simulation parameters for the radio access network are described in Table 2.

TABLE 2 Radio Access Network Simulation Parameters to Generate Dataset
Table 2- Radio Access Network Simulation Parameters to Generate Dataset

We simulate multiple scenarios with different mobility traces and different radio access network topologies. We consider four different radio access network deployment scenarios (namely, 6, 8, 10, 12) BSs in a fixed area. We keep the number of UEs for all scenarios constant to 30. We generate three mobility traces per radio access network topology (namely, MM1, MM2 and MM3) from SUMO. In each mobility trace, UEs have a different mobility pattern along the roads surrounding a building. The mobility traces are given as an input to the ns-3 simulator. The network simulation duration is 300 s. As we are focused on the radio access network, our goal is to generalize to different radio access network topologies and take into account network dynamics such as handovers, cell-load and inter-cell interference. We post-process ns-3 simulations to collect 18,000 graph samples in total from the aforementioned radio access network topology scenarios and mobility traces to build our graph dataset. The construction of the graph topology is described in Section VI-B.

To study the performance and generalization capabilities of WirelessNet, we perform experiments in two setups, namely Setup A and Setup B. In setup A, we only use the 6 BS radio access network deployment scenario to train our model. All the other simulation scenarios are used to evaluate our model to show the efficiency of our proposed model in terms of learning speed, accuracy and generalization to different radio access network deployments benchmarked against a FCDNN. In Setup B, we utilize the 6, 8, 12 BSs radio access network deployments and we split this dataset with a 60/20/20 split ratio for the train/val/test set. We use the 10 BSs radio access network to evaluate generalization of our model to unseen deployments in this setup. Table 3 summarizes the train/val/test split configuration of our experiments, where the number in (\cdot ) denotes the number of samples per split. We consider the following baselines to benchmark WirelessNet:

  • A FCDNN to map input features of all radio access network nodes to downlink rates of all users. This baseline can only be compared with Setup A, since it cannot incorporate more input features from different radio access network topologies (e.g., additional BSs). Features of all the nodes are flattened into a single input feature vector and fed to the FC-DNN. The output is the downlink rates of all users.

  • In setup B, WirelessNet is benchmarked against cellular network simulations, in terms of how well it can replicate its behavior with low computational cost.

TABLE 3 Dataset Split Configuration
Table 3- Dataset Split Configuration

B. Construction of Input Heterogeneous Graphs

Users perform measurements of reference signals transmitted periodically by BSs, within the cellular network simulations. The received reference signals are perturbed by the wireless channel12 and noise. Let j \in {\mathcal {N}}^{c}(i) denote the BS to which user i is attached to,13 and j' \in {\mathcal {V}}_{\text {BS}} \setminus {\mathcal {N}}^{c}(i) denote the set of neighboring BSs the user is not attached to. The handover procedure is based on the A2-A4 event [44] where the Reference Signal Received Quality (RSRQ) from neighboring BSs is measured at UE i . The RSRQ from the serving BS and that from an arbitrary neighboring BS is given by \zeta _{j} \in \mathbb {R} and \beta _{j'} \in \mathbb {R} , respectively. User i stays attached to BS j if the RSRQ of this BS is higher by at least 1 dB than that of the best neighboring BS as specified in 3GPP standards [44]. Then, the downlink communication adjacency matrix in downlink between all BSs and users \boldsymbol {A}_{c} \in \{0,1\}^{\vert {\mathcal {V}}_{\text {BS}} \vert \times \vert {\mathcal {V}}_{\text {UE}} \vert } is constructed as\begin{align*} A_{c(j,i)} = \begin{cases} \displaystyle 1 & \zeta _{j} - \arg \max _{j'\in {\mathcal {V}}_{\text {BS}} \setminus {\mathcal {N}}^{c}(i)} \beta _{j'} \geq 1 \: \text {dB} \\ \displaystyle 0 & \text {otherwise.} \end{cases} \tag {10}\end{align*} View SourceRight-click on figure for MathML and additional features.

The 1\:\text {dB} difference between neighboring BS and serving BS aims to ensure that UE receives better signal quality after handover event. Similarly, user i measures the RSRP of serving BS and it is given by p_{j} \in \mathbb {R} . The RSRP of a neighboring BS it is not attached to is given by p_{j'} \in \mathbb {R} . The RSRPs of neighboring BSs serve as a proxy of inter-cell interference levels. Typically, the received power of a reference signal is considered to be above the noise floor if the measured signal is higher or equal −111 dBm in downlink cellular networks [50]. Therefore, it is a reasonable assumption to consider interferers those neighboring BSs whose respective reference signals are received by user i with a higher received power than this threshold. The downlink interference adjacency matrix \boldsymbol {A}_{I} \in \{0,1\}^{\vert {\mathcal {V}}_{\text {BS}} \vert \times \vert {\mathcal {V}}_{\text {UE}} \vert } is then constructed as\begin{align*} A_{I(j,i)} = \begin{cases} \displaystyle 1 & p_{j'} \geq -111 \: \text {dBm} \\ \displaystyle 0 & \text {otherwise}. \end{cases} \tag {11}\end{align*} View SourceRight-click on figure for MathML and additional features.

C. Learning Model Setup

The UE node features are given by \boldsymbol {x}_{\text {UE}}:= [x_{i}, y_{i}, \gamma _{i}, \rho _{i}] \in \mathbb {R}^{1\times n_{\text {UE}}} \: \forall i \in {\mathcal {V}}_{\text {UE}} , where n_{\text {UE}} \in \mathbb {N} denotes the number of UE node features. The x_{i} and y_{i} features denote the position of UE i , \gamma _{i} its downlink SINR averaged over \Delta t and \rho _{i} its proportion of allocated resources during \Delta t . The downlink rates label per user is given as r_{i} \in \mathbb {R} , where \boldsymbol {r}_{\text {UE}}:= [r_{1}, \ldots , r_{\vert {\mathcal {V}}_{\text {UE}} \vert }]^{\top } \in \mathbb {R}^{\vert {\mathcal {V}}_{\text {UE}} \vert } denotes the rate label vector for all users. The BS node features are given by \boldsymbol {x}_{\text {BS}}:= [x'_{j}, y'_{j}, \mu _{j},\delta _{j}] \in \mathbb {R}^{1\times n_{\text {BS}}} \: \forall j \in {\mathcal {V}}_{\text {BS}} , where n_{\text {BS}} \in \mathbb {N} denotes the number of BS node features. The x'_{j} and y'_{j} features denote the position the BS j is deployed on, \mu _{j} denotes its cell-load (as the number of users attached) and \delta _{j} denotes its RB utilization.14 Communication e_{c} and interference e_{I} edges exist from BS node j to UE node i as per the rules defined in previous section. Consider x_{c, e_{c}}:= [p_{ji}] \in \mathbb {R} as the communication edge weight, where p_{ji} denotes the latest RSRP of serving BS. Moreover, consider x_{I,e_{I}}:= [p'_{j'i}] \in \mathbb {R} as the interference edge weight where p'_{ji} denotes the latest RSRP of interferer BS j' . Note that edge weights multiple the passed messages as shown in Fig. 3.

A minibatch is created by concatenating multiple graphs into a single hypergraph by increasing the indices of the nodes where graph samples are unconnected between each other. Consider the minibatch size as m' \in \mathbb {N} and w as the w- th minibatch. Therefore, the downlink rates for the minibatch is given as \boldsymbol {r}^{(w)}_{\text {UE}}:= [r_{\text {UE},1},\ldots ,r_{\text {UE},m'\vert {\mathcal {V}}_{\text {UE}}\vert }]^{\top } \in \mathbb {R}^{m'\vert {\mathcal {V}}_{\text {UE}}\vert } . The cost function per mini-batch w using the MAE loss is then given by\begin{align*} J^{(w)} = \dfrac {1}{m' \vert {\mathcal {V}}_{\text {UE}} \vert } \sum _{q'=1}^{m'\vert {\mathcal {V}}_{\text {UE}} \vert } \vert \boldsymbol {r}^{(w)}_{\text {UE}} - g_{\Phi }(f_{\boldsymbol {\Theta }}(\mathcal {G}^{(w)}))\vert + \lambda \Omega (\boldsymbol {\Theta }, \Phi ). \tag {12}\end{align*} View SourceRight-click on figure for MathML and additional features. \Omega (\boldsymbol {\Theta }, \Phi) is the \ell _{2} norm of the vec(\boldsymbol {\Theta }) and vec(\Phi) scaled by the regularization hyperparameter \lambda \in \mathbb {R} . The HMPGNN models are trained using the Adam optimization algorithm, batch normalization is used after the aggregation function for all edge types to produce the final representation. The cost function given in Eq. (12) is minimized for \lambda = 0.0005 . Models are trained for 300 epochs. We implemented WirelessNet with Pytorch Geometric [49], a library built upon Pytorch to implement and train GNNs. Each node feature (including edge weight) and label is normalized independently with the corresponding train set into a common desired distribution as\begin{equation*} \boldsymbol {x}' = G^{-1}(F(\boldsymbol {x})). \tag {13}\end{equation*} View SourceRight-click on figure for MathML and additional features. F: \mathbb {R} \rightarrow [{0,1}] is the cumulative distribution function (CDF) of feature \boldsymbol {x} \in \mathbb {R}^{m} and G^{-1}: [{0,1}] \rightarrow \mathbb {R} is the quantile function of the desired output distribution G for the scaled feature \boldsymbol {x'} \in \mathbb {R}^{m} . This transformation spreads most frequent values and reduces the impact of outliers [51].

The evaluation metrics are MAE, the standard deviation (Std Dev) of the absolute error and the mean absolute percentage error (MAPE). MAPE is calculated as\begin{equation*} \dfrac {\vert \vert \hat {\boldsymbol {r}}_{\text {UE}} - \boldsymbol {r}_{\text {UE}} \vert \vert _{1}}{\vert \vert \boldsymbol {r}_{\text {UE}} \vert \vert _{1}}. \tag {14}\end{equation*} View SourceRight-click on figure for MathML and additional features.MAPE is defined as mentioned to avoid diving by zero (for when the downlink rates are equal to 0 in challenging channel conditions). The Std Dev indicates how close or far apart the absolute errors are from the MAE across all test samples. Hyperparameters of our proposed HMPGNN model and the FCDNN baseline are described in Table 4. Quantile feature normalization as in Eq. (13) is used to normalize the features and outputs of WirelessNet and the FCDNN baseline.

TABLE 4 Learning Hyperparameters
Table 4- Learning Hyperparameters

D. Learning Process

Setup A is used to compare the learning curves of WirelessNet with the FC-DNN model. As shown in Fig. 4, our proposed model achieves a significantly lower validation loss than the FC-DNN baseline in less epochs. A sharp decrease in validation loss is achieved. Our explanation of the improved performance is due to two key reasons: strong relational inductive bias [45] and shared parameters for common wireless phenomena. For the former, the dynamic changes in the radio access network due to mobility, makes the relevant information to reconstruct the downlink rate for each user constantly change and the interactions between the network entities change. These changes are reflected in the computation graph of the WirelessNet model by means of the constructed input graph which accounts for the underlying wireless physical phenomena happening at different nodes. For the latter, shared parameters per edge type across the different nodes transform local information with the same function to account for a common wireless phenomena, such as wireless communication or interference. The FCDNN achieves a significantly higher validation loss. Our explanation for this is because it does not incorporate any structure of the radio access network unlike our proposed method. Significant more amount of samples are needed for the FCDNN to increase its performance. Note that, when training for 1000 epochs, the FCDNN started overfitting and thus increasing its validation loss. In the case of WirelessNet, it did not overfit and achieved a reduced validation loss.

FIGURE 4. - Learning Curves Comparison for Setup A between WirelessNet and a FCDNN.
FIGURE 4.

Learning Curves Comparison for Setup A between WirelessNet and a FCDNN.

E. Numerical Evaluation

1) Setup A

We evaluate the accuracy of our proposed WirelessNet model in both in-distribution estimates and out-of-distribution estimates in both Setup A and Setup B. In Setup A, the accuracy of WirelessNet is evaluated in different radio access network deployments with increasing number of deployed BSs training only with 6 BS radio access network deployment. Note that when training the FCDNN model with the 6 BS scenario, the model can’t be evaluated on denser network deployments as the number of inputs change. Table 5 shows that WirelessNet significantly outperforms the FCDNN model for in-distribution downlink rate reconstruction, where MAPE and the Std Dev are reduced significantly. A denser radio access network deployment, decreases the accuracy of WirelessNet from 5% up to 17% even though it was only trained with one network deployment (6 BSs). A denser radio access network deployment, produces more distributional shift between the training data and the data the network model encounters when deployed. These results point out to the capability of WirelessNet to generalize to larger problem scales even though it was only trained in a single radio access network deployment. The input heterogeneous graphs constructed for the 6 BS network deployment train set and those heterogeneous graphs constructed for the (8, 10 and 12 BSs) generalization sets represent the same underlying wireless phenomena. Moreover, another key aspect that helps WirelessNet generalize, is that quantile transformations are used to normalize the input features and labels. In this way, feature values of new/unseen data that fall below or above the fitted range will be mapped to the bounds of the distribution of the scaled features.

TABLE 5 Accuracy for Downlink Data Rate [Mbits/ \Delta t ] Reconstruction (Setup A)
Table 5- Accuracy for Downlink Data Rate [Mbits/
$\Delta t $
] Reconstruction (Setup A)

2) Setup B

In this setup, WirelessNet is trained with multiple deployment scenarios (6, 8 and 12 BSs). Table 6 shows that WirelessNet has significant degrees of generalization to different network deployments when having enough and representative training data. The performance degradation due to the unseen 10 BS network deployment is 4% in terms of MAPE. Results showcase the robustness and generalization capabilities of WirelessNet to unseen network deployment scenarios. To further visualize how WirelessNet is replicating network behavior, we plot in Fig. 5 the evolution of the experienced downlink rate of some of the users in the radio access network on the WirelessNet model. Fig. 5 illustrates how WirelessNet accurately reconstructs downlink rates for the users and thus co-evolves with the radio access network. As shown, the variability of the experienced downlink rate per user seems to be captured to a significant degree with our proposal. Moreover, to further evaluate the generalization capabilities of the model, we plot the Cumulative Distribution Function (CDF) of the absolute error over the test samples. Fig. 6 shows that there is significant overlapping between the CDFs of the 10 BS network deployment scenario (generalization) and that of the hold-out test set.

TABLE 6 Accuracy for Downlink Data Rate [Mbits/ \Delta t ] Reconstruction (Setup B)
Table 6- Accuracy for Downlink Data Rate [Mbits/
$\Delta t $
] Reconstruction (Setup B)
FIGURE 5. - Evolution of users in the cellular network (ns-3) and its WirelessNet model for UEs with indices 1, 2, 3 and 4. Generalization performance (10 BS) is shown (i.e., deployment scenario unseen during training).
FIGURE 5.

Evolution of users in the cellular network (ns-3) and its WirelessNet model for UEs with indices 1, 2, 3 and 4. Generalization performance (10 BS) is shown (i.e., deployment scenario unseen during training).

FIGURE 6. - CDF of the absolute error over the test samples.
FIGURE 6.

CDF of the absolute error over the test samples.

3) Computational Runtime

We measure the runtime cost of an exemplary 60 s network simulation with input features given as input to WirelessNet.15 We perform these experiments on an Intel (R) Xeon (R) CPU E5-2697 v2 @ 2.70 GHz processor, instead of a graphics processing unit (GPU) platform. We use the perf_counter function to measure the inference execution time. On a 12 BS and 30 UEs scenario, WirelessNet takes approximately an order of 10^{-2} s (8.3 ms) averaged over 30 runs. These experiments show that multiple what-if network scenarios of real radio access networks may be rapidly and accurately evaluated with the WirelessNet radio access network model and therefore be suitable for a real-time control loop. The runtime cost of ns-3 simulations on the same hardware is 9.3\times 10^{3} s. Due to additional computations in ns-3 simulations, comparison is not completely fair. However, it illustrates that for the purpose of performance evaluation, WirelessNet can reconstruct downlink rates given different network configurations several orders of magnitude faster than ns-3. Network simulators although accurate are computionally expensive.

4) Ablation Experiments

In order to understand which components of WirelessNet influence most significantly the performance, we perform ablation experiments by removing one input node feature or a part of the model architecture. Using Setup B and same hyperparameters as in Section VI-C, we retrain WirelessNet with the removed component and evaluate the performance. These ablation experiments give insights on how the different components of the model contribute to the overall performance. We perform the following ablation experiments:

  • Remove the inter-cell interference edges (referred to as Setup B.1 - No interference edges).

  • Remove the SINR UE node feature (referred to as Setup B.2 - No SINR).

  • Replace the SINR with RSRP of serving BS as UE node feature (referred to as Setup B.3 - RSRP instead of SINR).

  • Replace the SINR with RSRP of serving BS as UE node feature and remove inter-cell interference edges (referred to as Setup B.4 - RSRP and no interference edges).

  • Replace the SINR with RSRP of serving BS as UE node feature. Use element-wise summation as aggregator per edge type (referred to as Setup B.5 - RSRP instead of SINR (Sum Aggregation)).

  • Replace the SINR with RSRP of serving BS as UE node feature and no interference edges. Use element-wise summation as aggregator per edge type (referred to as Setup B.6 - RSRP and no interference edges (Sum Aggregation)).

We summarize the ablation experiments in Table 7. Setup B.1 shows the increase in reconstruction error when removing inter-cell interference edges of the input heterogeneous graphs. The increase in performance with inter-cell interference edges is reduced since interference information is already present within the downlink SINR UE node feature. We observe in Setup B.2 that the downlink SINR \gamma _{i} has a significant effect in the performance of in-distribution estimates and generalization to unseen network deployments. A similar phenomenon was observed in the context of rate reconstruction for a single user with real measurements [23], models with SINR as input feature can generalize to geographical locations unseen during training. Setup B.2. indicates that the SINR input feature is the most dominant factor to reconstruct downlink rates. Yet, to achieve these gains, the SINR as a node feature of UE nodes within our proposed input graph guarantees that only the respective SINR of user i are used for the computation of the downlink rate of user i. As observed in the improved learning of our proposed model compared to a fully-connected neural network in Section VI-D, the increased performance with SINR can be achieved due to strong relational inductive bias [45] of the model architecture given by the input heterogeneous graphs.

TABLE 7 Accuracy for Downlink Data Rate [Mbits/ \Delta t ] Reconstruction (Ablation Experiments)
Table 7- Accuracy for Downlink Data Rate [Mbits/
$\Delta t $
] Reconstruction (Ablation Experiments)

By replacing the SINR input feature with a more practical input feature such as RSRP of the serving BS as a UE node feature, we evaluate the contribution to the performance due to WirelessNet’s model architecture design more comprehensively as there is no input node feature information associated to inter-cell interference. Namely, we evaluate the contribution to the performance of inter-cell interference edges that indicate computation with different independent trainable parameters associated to the interference phenomena during the forward-pass as illustrated in Fig. 3. The results in Setups B.3 and B.4 show that the inclusion of inter-cell interference edges decreases the reconstruction error for in-distribution estimates and marginally decreases the reconstruction error with unseen network deployments. Using element-wise summation \sum _{j=0}^{\vert {\mathcal {N}}_{i}^{\tau }\vert } \boldsymbol {m}_{j}^{\tau } as aggregator function per edge type g_{A}^{\tau } , instead of the element-wise \max (\cdot) function, Setups B.5 and B.6 show a more significant decrease in MAPE with the inclusion of inter-cell interference edges for in-distribution estimates and generalization to unseen network deployments. Our explanation for the increased performance is that aggregating interference messages with element-wise summation replicates the physical interference phenomena within the model’s architecture more accurately. The cost is a linear increase in computation with increasing number of interfering BSs, however without the use of additional UE node feature information. In addition, Setups B.3/B.5 compared to Setups B.4/B.6 respectively, show the improved performance of HMPGNNs compared to homogeneous GNNs for the modeling of radio access networks, since using one edge type (e.g., Setup B.4/B.6) and having the same number of features in UE and BS nodes effectively render our model as a homogeneous graph neural network. WirelessNet provides more modeling flexibility to incorporate more complex radio access networks and scenarios as well as significantly increased performance compared to homogeneous graph neural networks with a linear increase in computation.

SECTION VII.

Network Applications

Following the system architecture shown in Fig. 1, we showcase two network applications served by WirelessNet, namely, radio access network deployment planning and AI/ML model training for QoS prediction. We use the same WirelessNet radio access network model trained with Setup B to serve both network applications.

A. Radio Access Network Deployment Planning

We address a radio access network deployment planning scenario to meet a specific target user and system QoS requirements for a specific load pattern. The mobility of users is given by MM1, MM2, MM3 mobility traces for 30 UEs. We use WirelessNet to analyze the user and system network performance of the achieved downlink rates for the 6, 8, 10, 12 BS deployments with said load pattern. We use 4500 samples per deployment. We assume WirelessNet receives the input feature information as described in Section VI-C. First, the input features are transformed into a vector representation h_{i} with f_{\Theta }(\cdot) and thereafter said representation is transformed to downlink rate r_{i} with g_{\Phi }(\cdot) . We consider the ns-3 network simulations of the cellular network as the ground truth.

Consider a network operator with the following target QoS requirements for a new radio access network deployment:

  1. Maintain the minimum downlink rate above 2 Mbits/\Delta t at least 70% of the time and

  2. Maintain the system throughput above 40 Mbits/ \Delta t at least 40% of the time.

In Fig. 7, we plot the CDF of the achieved downlink rates per user and the system throughput in Bits / \Delta t using WirelessNet in dotted lines for the different network deployments. To compare the ground truth network performance, we also plot the user and system performance from the radio access network simulations in solid lines. Fig. 7 shows that the WirelessNet radio access network model is able to accurately evaluate network performance for network deployment planning. By evaluating the different network deployments, Fig. 7 shows that the 10 BS deployment is able to meet the target requirements with the least cost (in terms of network deployment). The performance evaluation of all network deployments using the WirelessNet radio access network model took 30.5 ms. This fact enables the quick evaluation of multiple what-if scenarios for closed-loop network optimization. For example, WirelessNet can enable the optimal and fast deployment of mobile BSs mounted in unmanned aircraft vehicles (UAV).

FIGURE 7. - CDF of Downlink Rates for Radio Access Network Deployment Planning with WirelessNet and the ground truth Cellular Network Simulator (ns-3).
FIGURE 7.

CDF of Downlink Rates for Radio Access Network Deployment Planning with WirelessNet and the ground truth Cellular Network Simulator (ns-3).

B. AI/ML Model Training for QoS Prediction

We address the training of QoS prediction models using vector representations from WirelessNet radio access network model to predict downlink data rate of all users as a network application. The intuition is that vector representations \boldsymbol {h}_{i} generated with a downlink rate reconstruction training objective should help the downlink rate prediction task. Hence, g_{\Phi }(\cdot) is not used as we are interested in predicting QoS. Let q_{\text {Q}}: \mathbb {R}^{1\times n"} \rightarrow \mathbb {R} denote a QoS prediction function that uses unstructured raw measurements where n" \in \mathbb {N} is the number of input features including local information of a particular user and network information of all BSs. The input sample of the QoS model is then given by \boldsymbol {x}:= [\boldsymbol {x}_{\text {UE}}, \boldsymbol {x}_{\text {BSs}}] \in \mathbb {R}^{1\times n"} where \boldsymbol {x}_{\text {BSs}}:= [\boldsymbol {x}_{\text {BS}_{1}}, \ldots ., \boldsymbol {x}_{\text {BS}_{\vert {\mathcal {V}}_{\text {BS}}\vert }}]\in \mathbb {R}^{1\times n_{\text {BS}}\vert {\mathcal {V}}_{\text {BS}}\vert } . For QoS prediction models trained using vector representations, the unstructured raw measurements are first transformed to vector representations by the WirelessNet model and thereafter given as input to train QoS prediction models. As shown in Fig. 2, WirelessNet generates a vector representation per UE where d \in \mathbb {N} denotes the number of elements of the vector representation. Let q_{\text {DT}}: \mathbb {R}^{1\times d} \rightarrow \mathbb {R} denote a QoS prediction function trained with the vector representation \boldsymbol {h}_{i} generated by the radio access network model as input sample. The datasets are constructed by stacking vertically the information of all users. We consider the 6 BS and 30 UE scenario with a prediction horizon of 7~s . The QoS prediction models are trained with 77790 input samples in total and we split the data with a 60/20/20 train/val/test split ratio. We consider the following QoS prediction models:

  • a FCDNN model trained with raw input data of radio access network as baseline,

  • a FCDNN model trained with vector representations generated from WirelessNet,

  • a Random Forest (RF) model trained with raw input data of radio access network as baseline and

  • a RF model trained with vector representations generated from WirelessNet.

Table 8 shows their hyperparameters. The hyperparameters are tuned on the hold-out validation set using random search for the FC-DNN model and off-the-shelf hyperparameters are used for the RF model.
TABLE 8 Learning Hyperparameters of QoS Prediction Models
Table 8- Learning Hyperparameters of QoS Prediction Models

We train QoS prediction models and evaluate their performance on the hold-out test set. By doing so, effectively, we employ transfer learning, where model parameters of WirelessNet are kept fixed and the parameters from the QoS prediction models are trainable. Table 9 shows that the QoS prediction models trained with the structured vector representations \boldsymbol {h}_{i} generated by WirelessNet, have a significant increased prediction performance compared to unstructured raw measurements. The performance gain is significant for both RF and FCDNN models. The vector representations generated by WirelessNet contain local and neighborhood information of the relevant network nodes as well as structural information of the wireless phenomena between involved network nodes from the perspective of each UE. The intuition behind the increased performance is that the latent features of the vector representation outputted by WirelessNet, correspond more closely to the underlying causes of the observed data with distinct latent features or directions in representation space [8]. These benefits are not possible with a non-data-driven approach that outputs unstructured data.

TABLE 9 Downlink Rate Prediction Accuracy of All Users With a 7S Prediction Horizon Using a Single QoS Prediction Model
Table 9- Downlink Rate Prediction Accuracy of All Users With a 7S Prediction Horizon Using a Single QoS Prediction Model

C. Discussion

The showcased network applications show the practical benefits of using WirelessNet. Firstly, the use of a single radio access network model for multiple network scenarios enables the decrease in model updates due to shifts in the input data distribution in radio access networks. Secondly, with the use of a single vector representation for multiple network applications, the radio access network model improves its scalability to different service requests from multiple network applications with minimal impact on existing network functions implemented in practical cellular networks. Thirdly, from a radio access network expansion perspective, our proposed model architecture can naturally accommodate different new heterogeneous radio access nodes and wireless interactions expected in next generation mobile networks as new network node and edge types in the heterogeneous graph. These practical benefits showcase the potential of WirelessNet to enable different use cases of the network digital twin paradigm in radio access networks [1].

However, further challenges remain to enable the network digital twin paradigm. For example, novel training procedures could be explored to generate general vector representations to potentially serve many more network applications and any arbitrary number of future steps. The use of generative AI methods to produce synthetic input samples representative of different corner network scenarios is another promising direction. Said synthetic input samples could be given as input samples to WirelessNet to efficiently evaluate the network performance of corner what-if scenarios as well as generate their respective vector representations for improved AI/ML model training. Towards deploying radio access network models such as WirelessNet in real cellular networks, the use of real-world datasets from mobile network operators as future work is required. With a real dataset, the applicability of our proposed model in a more unpredictable real-world scenario can be assessed (e.g., when training only with a limited amount of real-world samples). Moreover, the transferability of our model to a real-world scenario when training only with simulations can be evaluated.

From a deployment perspective, WirelessNet can be deployed in different entities of a 3GPP mobile network, each deployment option with different architectural impact. We argue that deploying WirelessNet in a BS or an entity within the radio access network is more suitable, since the collected data to be given as input samples to WirelessNet remains in the radio access network and therefore, the communication overhead reduces as the input samples do not traverse the whole mobile network (e.g., across the core network). Another advantage is that the computational complexity of WirelessNet is manageable so as to be deployed in a BS. Moreover, if WirelessNet is deployed at the core network, typically, in this scenario there is less granular information available from the radio access network. As WirelessNet can provide services to multiple network applications, it can be loosely coupled with other network functions facilitating its deployment within the 5G service-based architecture without significantly impacting existing network functions in the radio access and core network.

SECTION VIII.

Conclusion

We propose WirelessNet, a novel radio access network model based on HMPGNNs and heterogeneous graphs. WirelessNet efficiently outputs accurate downlink rates of users and useful vector representations for downstream network applications. We design WirelessNet to account for structural changes of different physical wireless phenomena (i.e., wireless communication and inter-cell interference) between users and the radio access network within its model architecture. We use cellular network system-level simulations to train and evaluate our proposal. WirelessNet accurately reconstructs downlink rates of users and generalizes to unseen network deployments with significantly lower computational runtime compared to a network simulator and significantly more accuracy than FCDNN models. With ablation experiments, we validate the SINR UE node feature as the most significant contributor to the performance. In a more practical setting using RSRP from serving BS instead of SINR, WirelessNet achieves comparable downlink rate reconstruction performance for in-distribution estimates and unseen network deployments. We show that the inter-cell interference edges that replicate the interference phenomena within WirelessNet’s model architecture contribute significantly to the performance. The ablation experiments show that WirelessNet significantly outperforms homogeneous GNNs. Finally, we show the benefits of using WirelessNet for two network applications: namely, radio access network deployment planning and AI/ML model training for QoS prediction.

From a mobile network implementation perspective, by using a single AI/ML radio access network model for multiple network applications, the scalability with respect to new network applications requesting modeling services is improved. However, the nature of different network applications impacts the required modeling scope of the AI/ML radio access network models. For example, network applications related to the optimization of physical layer functions (e.g., beamforming optimization), requires a different and more detailed modeling scope (e.g., modeling of multi-antenna transmission and reception). Moreover, these models will have to be complemented by efficient radio map models that accurately account for all relevant environment-dependent wireless propagation effects. As we increase the modeling scope of AI/ML radio access network models, the increase in computational complexity will also have to be mitigated. A general, efficient and accurate radio access network model (or set of models) which can provide modeling services to any network application will fully enable the network digital twin paradigm in mobile networks. We hope further AI/ML radio access network models will be proposed with a similar or an increased modeling scope compared to WirelessNet. To achieve the same deployment flexibility in a 3GPP mobile network as WirelessNet, a general radio access network model should be designed to mitigate the increase of computational complexity, sample complexity and communication overhead.

References

References is not available for this document.