Journals & Magazines >IEEE Access >Volume: 12

Radio Environment Map Construction Based on Privacy-Centric Federated Learning

Federated learning-based radio environment map (REM) construction

Abstract:

In today’s digital age, coverage prediction is essential for optimizing wireless networks and improving user experience. While numerous path loss models and advanced mach...Show More

Metadata

Abstract:

In today’s digital age, coverage prediction is essential for optimizing wireless networks and improving user experience. While numerous path loss models and advanced machine learning algorithms have been developed to achieve high prediction performance, they predominantly operate within a centralized learning paradigm. While effective, this conventional approach often suffers from scalability and privacy limitations that are critical to the successful deployment of wireless maps. Conversely, in this paper, we propose a novel decentralized approach based on a federated learning long short-term memory (LSTM) model to accurately predict network coverage in indoor environments. The proposed FedLSTM is a method that allows multiple users, or clients, to train the model without sharing their personal data directly with a central server. In an experimental setup, we used real data collected from numerous clients moving along different paths. The FedLSTM model is evaluated in terms of root mean square error (RMSE), mean absolute error (MAE), and R2. Furthermore, compared to a centralized counterpart, FedLSTM shows a slight increase in RMSE from 2.4 dBm to 2.5 dBm and an increase in MAE from 1.7 dBm to 1.9 dBm. In addition, we evaluate the proposed FedLSTM considering variations in the number of participating clients and the number of local training epochs. The results show that even devices with limited computational power can meaningfully contribute to the training of the federated model, with fewer epochs achieving competitive results. Graphical analyses of the radio environment maps (REMs) generated by both FedLSTM and the centralized LSTM highlight their similarities. However, FedLSTM provides client privacy while reducing communication overhead and server strain.

Federated learning-based radio environment map (REM) construction

Published in: IEEE Access ( Volume: 12)

Page(s): 28109 - 28121

Date of Publication: 19 February 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3367589

Funding Agency:

Contents

SECTION I.

Introduction

In the context of the Fourth Industrial Revolution, awareness of operating environment conditions is becoming increasingly important for the efficient management of resources in diverse dynamic systems. This era, marked by the emergence of advanced technologies such as artificial intelligence, big data, the Internet of Things (IoT), and robotics, has significantly transformed modern industrial ecosystems, especially smart factories. Our research is strongly motivated by the urgent need to address the complex and multifaceted challenges of ensuring robust, efficient, and secure wireless connectivity in such environments. Smart factories, which are not only hubs of innovation but also arenas where the reliability and precision of wireless communications are rigorously tested, have leveraged these technologies to automate production processes. This automation facilitates automatic assessment of process status, enabling timely intervention and improving overall operational efficiency [1].

Central to the functionality of smart factories is wireless communications, which assumes a pivotal role in empowering instantaneous tracking and surveillance of production processes. Moreover, the flexibility and simplicity of wireless communications make it the preferred mode of connectivity in the dynamic landscape of production environments, compared to cumbersome wired alternatives. However, the surging count of wireless devices introduces a challenge—potential interference with the industrial, scientific, and medical (ISM) band [2]. Moreover, obstacles within indoor environments can attenuate communication signals, leading to regions of radio shadow. Accordingly, a precise assessment of the extent of radio communication coverage emerges as a priority. In addressing the aforementioned challenge, radio environment map (REM) construction has been investigated as an innovative tool that serves to furnish intricate details about the radio environment within specific geographic areas. By harnessing the insights provided by the REM, informed decision-making is facilitated, and network operators can seamlessly identify coverage gaps and high-traffic regions [3].

In the literature, path loss models have been investigated for coverage prediction. However, these models depend on various factors, such as the distance between transmitter and receiver as well as the height of the receiver and transmitter above ground, which increases the difference in error prediction between real and estimated values [4]. Hence, machine learning (ML)-based approaches have emerged as innovative predictive methods capable of effectively addressing the intricate operational challenges within communication networks. These techniques have demonstrated a remarkable ability to achieve high prediction accuracy [5], [6]. For example, the authors in [7] introduced a three-level Reconfigurable Intelligent Surface (RIS) framework to improve the signal quality of wireless communications, focusing on efficient channel state information (CSI) acquisition with low latency and pilot overhead. It uses a sparse connected long short-term memory (SCLSTM) neural network to decompose and predict the dynamic channels between base stations and user equipment. This approach significantly outperforms traditional channel estimation methods in terms of accuracy and robustness. Moreover, researchers have projected the path loss in an urban setting in Beijing, China, when utilizing artificial neural networks (ANNs), support vector regression (SVR), and random forest (RF) models [8]. Assessments of performance were measured through root mean square error (RMSE), yielding results ranging between 4 dB and 5 dB. In [9],the authors proposed the extra tree regressor-based approach for REM construction in wireless communications networks for indoor environments. The results showed that the extra tree regressor can obtain the best accuracy with less computational time than other ensemble learning baseline schemes. To the best of our knowledge, the researchers described above considered a centralized manner where the learning process is managed by a central server or a base station. In this paper, we propose a federated learning (FL) approach called FedLSTM, which works with a long short-term memory (LSTM) model to provide distributed learning between users. Note that in the conventional centralized approach, more data are transmitted since both the features and the labels must be sent. On the other hand, in a distributed manner, the user only sends the weights of the local model that are entailed in computation.

Moreover, the proposed FedLSTM scheme allows both server and users to generate the REM. Then, the server can be considered network planning to obtain a REM that can solve coverage problems (installing APs or relays). Meanwhile, users can more fully appreciate coverage of the area, and can redirect to a better coverage area by looking at the REM. In addition, the proposed FL-based approach provides security because the data sent to the server are the weights, not the labels and features of each user. This scenario is very useful in commercial, hospital, and military environments where users do not want to share their location with unknown people, thus guaranteeing privacy and security.

We propose a novel FL-based approach to coverage prediction in indoor environments that not only minimizes data transmission but also enables network planning and empowers users to make informed decisions about their coverage. Additionally, it prioritizes user privacy and security, making it highly applicable in various sensitive settings

The main contributions of this paper can be summarized as follows.

First, we propose a novel FL-based approach, called FedLSTM, providing coverage prediction for indoor environments. FedLSTM enables distributed model training by having users send only model weights to the server. This is in stark contrast to centralized approaches where users must transmit both features and labels, leading to a significant increase in data transmissions.
Secondly, the data utilized in this study were collected from a real environment with location points captured using Emesent’s Hovermap and real received signal strength indicator (RSSI) values obtained via Raspberry Pi. After collection, we preprocess the location data with Emesent’s software and synchronize it with the cleaned RSSI readings from Raspberry Pi by using timestamps.
Third, we construct a REM by using Python software to enhance coverage prediction visualization. For this objective, we generate a grid of data comprising $1000\times1000$ grid points within our area of interest to plot coverage prediction over a 2D map.
Furthermore, our FL-based scheme empowers both server and users to generate the REM. The server functions as the network planner, leveraging the REM to address coverage issues by installing access points or relays. On the other hand, users can assess the coverage of their area and relocate to areas with better coverage by examining the REM.
In addition, we compared the FedLSTM model with its centralized counterpart, showing that our research ensures security by transmitting only the weights of each user’s model to the server, while the labels and features remain private. This approach is particularly advantageous in commercial, hospital, and military settings where users are hesitant to share their location data with unknown entities.

The remainder of this paper is organized as follows. Section II describes related work, and Section III describes data collection and preprocessing. Section IV outlines overall system model, including the FL scheme, the FedLSTM architecture, and REM construction. Section V presents numerical results and a computational complexity analysis, with visual findings presented in Section VI. Finally, conclusions are in Section VII.

SECTION II.

Related Work

Radio maps (commonly known as REMs) and their construction play a very important role in modern communication systems. These maps offer a comprehensive view of the radio spectrum environment by retaining various types of information, from geographic and land features to spectrum usage characteristics. Over the years, a significant amount of research has been conducted to enhance and diversify their applications. Introduced in 2006 [10], REMs have facilitated a multitude of applications, ranging from network monitoring [11], localization [12] and resource management [13] to V2X communication [14], [15].

Traditional methods of radio map construction often involve detailed field surveys, which can be time-consuming and labor-intensive. To address this, researchers have been developing algorithms to lower these costs. A number of path loss models have been influenced by factors like terrain suitability, the heights of the receiver and transmitter above ground level, their spatial separation, and the presence of intervening obstructions, among other things [16]. These elements can widen the gap between forecast and real signal degradation, with the extent of the variance hinging on the chosen propagation model. In the REM construction literature, ordinary kriging (OK) is frequently employed as a geostatistics-based spatial interpolation method [17], [18]. OK predicts unseen data points by considering the spatial relationships between recorded data and the relative locations of all sampled points [19].

In [20] Maiti and Mitra developed a radio map for indoor signal propagation, leveraging interpolation methodologies. Results showed that OK achieved better performance than ordinary methods like inverse distance weighting [21] and K-nearest neighbors (KNN). These methods were evaluated based on prediction error, i.e., RMSE. Although the OK-based model achieved better performance, it encounters a limitation in its computational proficiency, especially with more data points [18].

In [22] and [23], heuristic-derived methodologies were introduced providing indoor coverage prediction for indoor dominant path models. However, heuristic solutions are typically designed for specific problems, and might not generalize well to other scenarios or variations of the problem, making them inconsistent and unreliable in critical applications.

In classical prediction models, the design of mobile-device networks demonstrates an inherent lack of adaptability [24]. Predictions are constrained to specific conditions, such as frequency range, antenna height, and surrounding environmental conditions. Nonetheless, current observations indicate that the operational environment of modern radio networks is characterized by an elevated level of diversity and complexity [25]. Consequently, there is a strong need for prediction models that are more flexible and that can handle the challenges of modern networks.

ML-based prediction techniques are recognized as revolutionary within the realm of modern mobile-device network planning owing to their enhanced accuracy over age-old empirical prediction methods. When compared with deterministic-based models, the ML-based methods are notably superior in their data processing efficacy [26], [27].

For instance, the authors in [28] conducted research in an urban area of Lisbon, Portugal, at 3.7 GHz and 26 GHz frequency bands, leveraging an authentic 5G network. They utilized input factors, and the resultant values closely matched those in [8]. However, the dataset utilized was double in size. The study mainly focused on SVR and RF models, which showed error rates ranging between 6 dB to 7 dB.

Similarly, the authors in [29] conducted their research in suburban areas of South Korea, focusing on frequency bands of 450 MHz, 1450 MHz, and 2300 MHz. Although the input parameters were mostly similar to previous studies, a new parameter was introduced: the ratio between Tx height and Rx height. That research exclusively employed the Artificial Neural Network (ANN) and Gaussian Process Regression (GPR) ML models, both of which exhibited RMSE values ranging from 8 dB to 9 dB.

In addition, the authors in [30] evaluated ML models (ANN, SVR, and RF) in a rural environment in Greece in the 3.7 GHz band. Input parameters were 3D Tx-Rx distance, heights above sea level, and signal propagation (LOS/NLOS). The target was path loss, and RMSE ranged between 4 dB to 5 dB.

Lastly, the authors in [31] introduced a new approach using the extremely randomized trees regressor (ERTR) algorithm for mobile coverage prediction, and visualized results on a REM overlaid on top of Google Earth. Real measurement data from Victoria Island and Ikoyi in Lagos, Nigeria, were used. Through extensive simulations and comparisons with seven other ML algorithms, including ordinary kriging, the ERTR algorithm showed the lowest RMSE error at 2.75 dB with an $R^{2}$ score of 92%.

To the best of our knowledge, commonly employed methods for constructing radio maps heavily rely on centralized data approaches. While these methods are comprehensive, they bring forth computational complexities and potential vulnerabilities, particularly concerning user-specific data such as geospatial information. In environments such as commercial complexes, healthcare institutions, and military facilities, the adoption of a centralized approach is considered suboptimal. In these contexts, users place a high premium on privacy and security, necessitating measures to prevent the disclosure of location information to unauthorized parties. Our model incorporates the FL approach, advocating decentralized data processing across user nodes. Unlike the centralized framework, which necessitates transmission of both feature vectors and labels, the federated approach entails only transmission of model weight vectors. This reduction in data transmission overhead simultaneously enhances data security for users, with a minimal increase in prediction error.

SECTION III.

Data Collection and Preprocessing

In this study, we sourced our primary data from the Engineering Building at the University of Ulsan in South Korea. For the collection process, we systematically employed two key devices: the Emesent Hovermap and Raspberry Pi. The Emesent Hovermap, equipped with a state-of-the-art LiDAR sensor, operates using the simultaneous localization and mapping (SLAM) technique. It functions by emitting laser pulses and keenly noting the duration taken for these beams to bounce back after reflecting off surfaces. In relation to the speed of light, the LiDAR sensor can precisely calculate distances from the reflection time. Consequently, it creates an intricate 3D map based on these light reflections. After data collection, the raw data from Hovermap is processed and refined using the specialized Emesent software, making it fit for analytical use.

We used the received signal strength indicator as a metric to evaluate the quality of the connection between transmitting and receiving devices. In wireless systems, including Wi-Fi, Bluetooth, and cellular networks, RSSI is commonly used to measure the strength of a radio signal. These values in decibel milliwatts (dBm) are systematically obtained using the built-in Wi-Fi module in Raspberry Pi. The RSSI data is then pre-processed to remove any NaN values and outliers. The location data from the Emesent Hovermap and the RSSI data from Raspberry Pi are then synchronized using timestamps in Python to create a time series dataset. This synchronization aligns spatial coordinates with corresponding RSSI values over time, which is essential for LSTM model training and accurate REM construction in our federated LSTM approach.

SECTION IV.

System Model

Our experimental setup consists of 16 different clients that participate in model updates and a server that collects the updated models. Each of these clients traveled through 10 unique paths, updating its model with each trip. Each new data collection was used by the client to train its local model, adhering to predefined model and hyperparameter specifications. Upon completion of the training phase for a specified number of epochs, the local model weights are transmitted to a central server. At this central node, they are aggregated using the FedAvg to update the global model weight. This iterative process of individual learning and centralized aggregation was performed across all clients over 10 communication rounds.

A. Federated Learning

FL is a novel approach to ML where a model is trained across multiple devices or servers while keeping the data localized. Instead of transferring raw data to a central server for training, FL pushes the model to edge devices (smartphones, tablets, IoT devices, etc.) and allows training to happen locally on each device. After local training, only the model updates are sent to the central server where they are aggregated and the global model is updated. Figure 1 shows the overall FL process. This approach addresses multiple concerns, such as privacy, security, and data ownership, while ensuring a quality model.

FIGURE 1.

Federated learning overview.

Radio Environment Map Construction Based on Privacy-Centric Federated Learning

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Work

Data Collection and Preprocessing

System Model

A. Federated Learning

Algorithm 1 Federated Learning Process With FedAveraging

B. FedLSTM Architecture

C. REM Construction

1) The Fine Tuning Process

D. Model Evaluation

1) Implementation Platform

Numerical Results

A. Comparison with a Centralized LSTM Network

B. Impact of the Number of Clients on Global Convergence

C. Contribution of Local Model Epochs to Global Convergence

Graphical Results

Conclusion

References