Introduction
The rapid progress of Beyond 5G and 6G technologies in wireless communication systems is paving the way for fresh opportunities in optimization [1]. These upcoming wireless communication systems bring forth novel prerequisites such as maximum data rates, improved bandwidth, decreased latency, expanded coverage, enhanced spectral efficiency, and diminished power and energy consumption. Innovative methodologies are introduced to meet these challenging factors, which include NOMA, extensive MIMO, and low-power wide area networks (LPWAN). Multi-modal foundational models for Nevertheless, implementing these advancements in practice faces significant obstacles primarily due to heightened complexity and substantial hardware costs, particularly at frequencies found in the millimeter-wave (mmWave) range.
Smart wireless communication environments offer superior adaptability and mobility compared to terrestrial systems. However, maintaining line-of-sight communication in dynamic and dense urban areas poses challenges for mobile and IoT devices connecting to base stations. Reconfigurable Intelligent Surfaces (RIS) provides a promising solution by manipulating wireless signal propagation through reflection or refraction [2]. Recent advancements in intelligent radio systems have sparked research into RIS applications in Beyond-5G and 6G wireless networks.
The security of wireless communications during public events is vital for critical services like emergency response, security surveillance, and public health management. Adversarial jamming poses significant threats, undermining link quality and necessitating anti-jamming measures like RIS mounted on UAVs (Aerial RIS). This strategy is crucial for robust communication, especially in remote areas or densely populated urban hubs. Disruptions affect sectors like industrial automation, healthcare, military, emergency response, and connected vehicles, all reliant on wireless networks. In healthcare, wireless communication supports services like remote surgery and telehealth but is vulnerable to attacks and disruptions. Emergency communications are susceptible to jamming, hindering rescue efforts. Smart grid networks face threats from cyber attackers and environmental factors, leading to power outages and equipment failures.
Different methodologies have been explored for leveraging RIS affixed to buildings or UAVs deployed as relays to optimize communication metrics in non-hostile environments [3], [4], [5] [6], [7], however, limited attention has been paid to hostile scenarios. Recent studies have integrated RIS with UAVs to fine-tune communication parameters, considering challenges posed by single jammers in dynamic environments [8], [9] [10], [11] [12], [13] [14], [15]. Among some of the UAV-borne RIS anti-jamming solutions, the work by Hou et al. [8] stands out for utilizing a Reinforcement Learning (RL) technique known as Dueling Double Deep Q-Networks (3)-DQN), while the two other studies presented in [13] and [14] employ conventional Alternating Optimization (AO) techniques. An additional layer of complexity is added by considering the scalability and mobility of mobile devices, by the authors of [8] and [13]. One of the existing works in [7] presents a multi-clustered IoT scenario served by fixed multiple RIS without considering device-to-RIS association, clustering, involvement of UAVs, or presence of jammers. Similarly [12] employs the multiple UAV-mounted RIS relays to serve multiple device clusters, but it carries out fixed association without any swarm optimization or consideration of jammers.
An alternative approach to anti-jamming in multi-user scenarios is outlined in [15], which employs Win or Learn Fast-Policy Hill-Climbing (WoLF-CPHC) algorithm, utilizing an RIS fixed on a building to enhance system rate along with transmission protection against a single jammer. Similarly, the study in [16] addresses multiple jammers by employing a single UAV transmitter with a fixed RIS installed on a building to assist communication with a ground user. However, some anti-jamming works utilize multiple UAV relays but without RIS, such as [17] focusing on securing a single ground-based target from aerial jammers, targeting multiple jammers affecting ground-based IoT devices as proposed in [18]. The solution by [13] stands out by deploying multiple UAV-borne RIS relays to counter a single jammer interrupting a single mobile device.
We observe that existing swarm UAV-borne RIS-based solutions either fail to adapt to the dynamic nature of dense multi-user wireless scenarios or they do not consider this aspect at all, particularly considering real-time changes in the distribution and number of devices and jammers, as well as fluctuations in UAV resources. Current approaches that employ fixed strategies do not handle the dynamics and scalability of the system effectively; a low number of UAVs may not offer sufficient coverage in some settings, while a high number can increase energy consumption in others. They also fail to ensure energy efficiency and optimal coverage with continuous operations interrupted by battery outages or failures. Moreover, none of these solutions address the threats of multiple jammers to multiple devices.
Our work addresses these gaps by proposing an RL-based adaptive swarm formation and clustering mechanism for anti-jamming to secure 6G wireless communications in densely crowded urban environments, such as public events, stadiums, and shopping malls vulnerable to unknown jammers threatening critical public services like crowd management, emergency response, and security. This approach creates a flexible UAV-borne RIS swarm that dynamically adjusts the number of UAVs and the UAV-to-device clustering to real-time environmental changes, ensuring optimal coverage and energy efficiency. By utilizing real-time data to adapt the UAV swarm, this solution mitigates the impact of multiple unknown jammers with a multi-objective optimization strategy to maximize the sum rate and minimize energy consumption by optimizing UAV-to-device association, trajectories of UAV-borne RIS, RIS passive beamforming through phase shifts and base station transmit power, accommodating variations in jammer distribution and number of jammers and mobile devices, while ensuring continuous operations with UAV recharging and swapping facility.
Therefore, the first significant unique aspect is scalable and dynamic swarm formation and clustering due to changing conditions in a dense multi-user environment such as variation in distribution or quantity of jammers and number of devices where a single UAV or fixed swarm of UAVs, is insufficient to provide effective coverage. The RL-driven dynamic and adaptive UAV swarm formation and device clustering ensures that a sufficient number of UAVs are deployed to mitigate the effects of jammers on the devices while conserving energy. As the negative effect of jammers increases, more UAVs are dynamically added to the service while maintaining energy efficiency.
The second point of distinction is incorporating the Multi-Objective Optimization approach. Existing RL-based anti-jamming solutions, assisted by RIS, in single-user or multi-user scenarios, have predominantly focused on addressing only a single objective, such as enhancing the achievable rate maximization [8], transmission rate optimization [14], SINR improvement [13], or minimization of energy consumption [16]. Although a few solutions have tried to achieve multiple objectives concurrently, such as optimizing both sum rate and system protection level [15], they do so by a combination of conventional techniques with RL. Another anti-jamming solution presented in [19] also solves the multi-objective optimization but only for countering the jamming effect on a single device using a single UAV-mounted RIS relay.
The third aspect that sets our solution apart from the existing ones is the concept of UAVs recharging at the docking stations with swapping when their batteries expire or fail, which is one of the possible options presented by [20]. The only work in literature that provides the UAV recharging concept is presented by [21]; however, this solution applies RL to the UAV-assisted Intrusion Detection scenario, not in the wireless anti-jamming scenario.
In our proposed 6G scenario, we address the challenges of maximizing the sum rate and minimizing energy consumption of UAV-borne RIS relays by using a centralized RL agent at the base station with Proximal Policy Optimization (PPO) to optimize device-to-UAV associations, UAV trajectories, RIS beamforming, and base station transmit power. Adaptive UAV swarm formation and dynamic clustering mitigate jammer impact, balancing coverage, and energy conservation, while UAV recharging and swapping facility ensures continuous and stable operations.
The contributions of our work are described as follows:
We introduce a novel anti-jamming solution to protect critical 5G Advanced and/or 6G communications at densely crowded venues like stadiums, airports, and sporting arenas, with numerous vulnerable mobile devices targeted by multiple independent and unknown jammers that operate autonomously, while the system lacks complete knowledge of these disruptive sources. Our approach employs multiple UAV-mounted RIS relay platforms, for adaptive swarm formation and device clustering.
Our goal is to maximize the sum rate while minimizing the UAV usage for energy efficiency by dynamically optimizing the swarm formation of UAVs and device clustering by association with adaptive in-flight UAV-borne RIS relays, trajectories of UAVs, passive beamforming via RIS phase shifting, and adjusting base station transmission power to counter multiple jammers.
The optimization problem poses a significant computational complexity due to the highly dynamic and scalable nature of the system in terms of the distribution and number of jammers and devices that may change. This requires a strategy that adapts to variations and guarantees self-optimization of different parameters. Therefore, we propose a Reinforcement Learning (RL) technique called Proximal Policy Optimization (PPO) for multi-objective optimization.
The proposed system model incorporates a UAV swapping with a charging dock facility for recharging Swarm UAV batteries when they expire. This novel addition to the existing scenario guarantees practical, continuous, and seamless operations for UAVs, ensuring uninterrupted service and robust communications for mobile devices within a 6G wireless cellular region.
To demonstrate the superiority of our RL-based approach, we conduct comprehensive simulations across various configurations in related works and baselines, that include fixed RIS installations, fixed device-to-UAV associations, random device-to-UAV association, single-objective sum rate optimization, and UAV-mounted RIS with random phase shifting. These simulations cover diverse system configurations with variations in distribution areas and quantity of jammers, and the mobile devices.
The remaining paper is structured as follows. In Section II, we provide a concise review of the applications of RL techniques to address jamming threats in wireless communication systems. Additionally, we explore recent studies that have employed single and swarm UAV-mounted RIS platforms to enhance the communication efficiency in dense and dynamic adversarial environments. Section III presents the system model with problem formulation, providing detailed insights into the structure of Reinforcement Learning-based solution. In Section IV, we present the proposed implementation, covering the simulation setup, procedures, and results. Lastly, Section V offers a conclusion of this work.
Related Works
The challenge of mitigating the impact of jammers in wireless communication networks has been a focal point of research for several years. Traditional approaches to combat jamming, including an adaptive rate/power control, cognitive radio, and spread spectrum, have effectively reduced jamming threats [22]. However, due to highly adaptive jamming threats, their resilience remains limited in the face of increasing complexities of 5G advanced and 6G networks. The classical Spread Spectrum technique faces challenges due to strict requirements for spectral efficiency in 5G cellular networks. A more recent MIMO-based anti-jamming approach effectively addresses the jamming effects; however, it necessitates channel state information about jammers [23].
Reinforcement Learning (RL) operates within a Markov Decision Process (MDP), allowing agents to adapt policies based on feedback, overcoming constraints [24]. RL-based anti-jamming methods excel in dynamic environments. RL techniques optimize wireless communications with Reconfigurable Intelligent Surfaces (RIS) but often neglect jamming threats. In [3] and [25], Deep Reinforcement Learning (DRL) optimizes UAV-borne RIS-assisted systems using techniques like Proximal Policy Optimization (PPO) and Decaying Deep Q-Network (DQN). Similarly, [5] applies DRL to NOMA communications, yet fails to address jamming threats.
Recent research explores leveraging UAV-borne RIS for wireless communication optimization. For example, Xu et al. [26] optimizes UAV-borne RIS trajectory and transmit power with DDPG for maximizing total throughput in UAV-powered IoT networks. Samir et al. [9] minimizes the expected sum Age-of-Information (AoI) using PPO by optimizing RIS phase shift, UAV altitude, and scheduling. Khalili et al. [10] employs DDQN in UAV-assisted RIS Hetnets to minimize total transmit power. However, all of these solutions overlook jamming threats.
There are several techniques that aim to counter the impacts of jammers and eavesdroppers in wireless networks. Liu et al. [27] introduced HIF-DRL for frequency channel selection. In [28], ADRLA with RCNN addressed multiple jammer scenarios. Xiao et al. [29] proposed Hot-booting Q-learning for power allocation in MIMO NOMA systems. Additionally, Xiao et al. [30] presented a 2-D frequency-space anti-jamming scheme for multiple jammers.
Anti-Jamming challenges also extend to emerging vehicular communication systems like the UAV-assisted VANET [31]. Li et al. [32] introduced a DRL-based anti-jamming technique, while Yao and Jia [33] proposed the Collaborative Multi-agent Anti-jamming Algorithm (CMAA). Slimeni et al. [34] suggested On-Policy Synchronous Q-learning (OPSQ-learning) for real-time avoidance of jammed channels. Peng et al. [35] employed Multi-Dimensional Anti-Jamming Reinforcement Learning (MDAJRL) for UAV communication systems. Abuzainab et al. [36] introduced a QoS-aware routing protocol based on Actor-Critic DQN to navigate around communication holes due to jamming. Ye et al. [37] presented an anti-jamming solution based on PDDQN. However, none of these utilize RIS devices for anti-jamming assistance.
Reconfigurable Intelligent Surface (RIS) uses software-controlled planar surfaces with passive elements to alter wireless propagation, enhancing communication performance with low energy consumption and enabling scalable deployment [2] in smart wireless environments.
In a study by Yu et al. [38], a fixed RIS is integrated into a NOMA framework using LMIDDPG, enhancing energy efficiency in MEC systems [6]. This involves a UAV-borne base station communicating with a mobile device through an RIS, optimizing data rate and energy efficiency using DDPG and DQN [15]. H. Zhao et al. [16] address multiple jammers by deploying a fixed RIS on stationary objects, aiding UAV-mounted transmitters in communication while mitigating jammer effects. However, these solutions lack flexibility in RIS movement and solely focus on maximizing energy efficiency, which is critical in catering to the dynamic and scalable nature of the environment, such as varying numbers of mobile devices or jammers.
Therefore, UAV-mounted RIS deployment is explored for anti-jamming in dynamic environments, particularly against single jammers [8], [13]. Since dynamic RIS installation on UAVs has proven to have outperformed fixed deployment [15], the work in [14] deploys a UAV-borne RIS relay between ground base stations (GBUs) and users (GUs), employing Alternating Optimization (AO) and Manifold Optimization (MO) for optimizing parameters against single jammer. Similarly, authors in [8] and [19] also optimize parameters to maximize communication rates against a single jammer, using the Dueling Double Deep Q-Network (3-DQN) and Deep Deterministic Policy Gradient (DDPG) algorithms. However, these techniques lack the dynamism and scalability to handle variations in devices, jammers, and mobility in dense and dynamic wireless environments.
Recent works also explore the use of Swarm UAVs for performance optimization, including anti-jamming in dense, dynamic scenarios. Studies in [17] and [18] deploy Swarm UAVs without RIS for performance optimization but using the conventional Alternating Optimization (AO) technique. The authors in [13] introduce a system wherein multiple UAV-mounted RIS relays are employed to counter the effects of a single jammer using an Alternating Optimization (AO)-based Relax-and-Retract algorithm to maximize SINR at a mobile device. However, the works in [11] and [12] employ Swarm UAV-borne RIS platforms to improve communication parameters without considering jammers using conventional optimization techniques. The latter one [12], also involves static clustering through device association with the UAV-RIS platforms. Nevertheless, these Swarm UAV-mounted RIS-based solutions overlook dynamic swarm formation and clustering, hindering system dynamism and scalability. Moreover, these solutions also do not leverage Reinforcement Learning (RL) or consider multiple objectives in optimization. A summary of a few closely related works is presented in Table 1.
In contrast to existing approaches, our proposed solution utilizes RL-based swarm UAV-borne RIS relays with dynamic swarm formation and clustering through device-to-UAV association, UAV swapping with a battery recharge facility, which addresses the dynamism and scalability challenges in wireless systems due to variations in distribution or quantity of jammers or devices. Leveraging Proximal Policy Optimization (PPO), our solution enables joint optimization of multiple parameters of Swarm UAV-mounted RIS, achieving multiple objectives across multiple mobile devices amidst multiple jammers. This includes maximizing the sum rate and minimizing energy consumption by clustering optimization through dynamic device-to-UAV association, UAV swapping, and battery recharging. This ensures optimal performance while maintaining uninterrupted operations and conserving energy by minimal deployment of UAVs.
Adaptive Swarm Formation and Clustering through Aerial-RIS in Multi-Jammers Environments
In a dynamic and dense urban environment, we consider a 6G wireless communication system incorporating MIMO technology (Figure 1). This system is a crucial communication infrastructure for public events in densely populated smart cities. It facilitates connectivity for multiple clusters of mobile devices, including those used by logistics, emergency response, law enforcement, security, health safety, crowd management, and citizens. The base station B operates on 6G technology, ensuring efficient communication and data services. Despite these advantages, the system faces disruptions from multiple jammers (
System Model: Swarm UAV-borne RIS-assisted 6G Wireless Communications in a Dynamic and Dense Environment with Multiple Jammers.
A. System Model
We define a rectangular coordinate system representing a portion of a 6G micro-cell with the Base Station B installed at the origin, establishing maximum limits for the axes to define the mobility constraints for the Swarm Aerial RIS platforms
Mobile devices have the freedom of movement within predefined boundaries of the cell. The Swarm UAVs serve I mobile devices, forming clusters to ensure Quality of Service (QoS). The RL controlling unit of Swarm is stationed at Base Station B. This anti-jamming communications system operates in a dynamic and unknown environment, which implies continuous and uncertain changes in the environmental conditions and variables. The agent or observer does not possess full information regarding the conditions of the environment and its future evolution.
The base station B, multiple mobile devices (
We consider frequency-selective fading models for all communication channels in our scenario. Ground-based communications are modeled using the Rayleigh fading model, incorporating non-line-of-sight components arising from multi-path effects caused by obstacles [39]. Moreover, we also assume that the communication channels between the Base Station B and the Swarm UAV-mounted RIS platforms
1) Distance and Channel Model
The distance model is first defined as follows. We define the distance between ground base station B and mobile device \begin{align*} d_{BD_{i}} & = \sqrt {(x_{B} - x_{D_{i}})^{2} + (y_{B} - y_{D_{i}})^{2} + (0)^{2}} \tag {1a}\\ d_{BU_{c}} & = \sqrt {(x_{B} - x_{c})^{2} + (y_{B} - y_{c})^{2} + (z_{c})^{2}} \tag {1b}\\ d_{U_{c}D_{i}}& = \sqrt {(x_{c} - x_{D_{i}})^{2} + (y_{c} - y_{D_{i}})^{2} +(z_{c})^{2}} \tag {1c}\\ d_{J_{k}D_{i}}& =\sqrt {(x_{D_{i}} - x_{J_{k}})^{2} + (y_{D_{i}} - y_{J_{k}})^{2} + (0)^{2}} \tag {1d}\\ d_{J_{k}U_{c}}& =\sqrt {(x_{c} - x_{J_{k}})^{2} + (y_{c} - y_{J_{k}})^{2} + (z_{c})^{2}} \\ & \quad \forall c \in [1,C]; \forall k\in [1,K]; \forall i\in [1,I] \tag {1e}\end{align*}
We use various parameters to denote channel gains within the communication system. The parameter \begin{align*} h_{BD_{i}}[n]& = \Delta _{BD_{i}}[n] \hat {h}_{BD_{i}}[n] \tag {2a}\\ h_{BU_{c}}[n]& = \Delta _{BU_{c}}[n] \hat {h}_{BU_{c}} [n] \tag {2b}\\ h_{U_{c}D_{i}}[n]& = \Delta _{U_{c}D_{i}}[n] \hat {h}_{U_{c}D_{i}}[n] \tag {2c}\\ h_{J_{k}D_{i}}[n]& = \Delta _{J_{k}D_{i}}[n] \hat {h}_{J_{k}D_{i}}[n] \tag {2d}\\ h_{J_{k}U_{c}}[n]& = \Delta _{J_{k}U_{c}}[n] \hat {h}_{J_{k}U_{c}}[n] \tag {2e}\end{align*}
\begin{align*} \Delta _{BD_{i}}[n]& = \sqrt {L_{0}d_{BD_{i}}^{-\eta }[n]} \tag {3a}\\ \Delta _{BU_{c}}[n] & = \sqrt {L_{0}d_{BU_{c}}^{-\alpha }[n]} \tag {3b}\\ \Delta _{U_{c}D_{i}}[n] & = \sqrt {L_{0}d_{U_{c}D_{i}}^{-\alpha }[n]} \tag {3c}\\ \Delta _{J_{k}D_{i}}[n]& = \sqrt {L_{0}d_{J_{k}D_{i}}^{-\eta }[n]}; \tag {3d}\\ \Delta _{J_{k}U_{c}}[n]& = \sqrt {L_{0}d_{J_{k}U_{c}}^{-\alpha } [n]} \tag {3e}\end{align*}
In equations (3a - 3e), we define
The small-scale fading vectors, also known as Rician Fading or Line-of-Sight (LoS) components vectors, denoted by \begin{align*} \hat {h}_{BU_{c}}[n] =\sqrt {\frac {K_{1}}{K_{1}+1}}\bar {h}_{BU_{c}}[n]; \tag {4a}\\ \hat {h}_{U_{c}D_{i}}[n] = \sqrt {\frac {K_{1}}{K_{1}+1}}\bar {h}_{U_{c}D_{i}}[n]; \tag {4b}\\ \hat {h}_{J_{k}U_{c}}[n] = \sqrt {\frac {K_{1}}{K_{1}+1}}\bar {h}_{J_{k}U_{c}}[n]; \tag {4c}\end{align*}
\begin{align*} \bar {h}_{BU_{c}}[n] = [e^{j{\psi _{1}^{c}}}, e^{j{\psi _{2}^{c}}}, e^{j{\psi _{3}^{c}}}, e^{j{\psi _{4}^{c}}},\ldots,e^{j{\psi _{M}^{c}}}] \tag {5a}\\ \bar {h}_{U_{c}D_{i}}[n] = [e^{j{\omega _{1,i}^{c}}}, e^{j{\omega _{2,i}^{c}}}, e^{j{\omega _{3,i}^{c}}},\ldots,e^{j{\omega _{M,i}^{c}}}] \tag {5b}\\ \bar {h}_{J_{k}U_{c}}[n] = [e^{j{\phi _{k,1}^{c}}}, e^{j{\phi _{k,2}^{c}}}, e^{j{\phi _{k,3}^{c}}},\ldots,e^{j{\phi _{k,M}^{c}}}] \tag {5c}\end{align*}
Raleigh Fading, which is small-scale fading with non-LoS component \begin{align*} \hat {h}_{BD_{i}}[n] = \sqrt {\frac {1}{K_{1}+1}}\bar {h}_{BD_{i}}[n]; \tag {6a}\\ \hat {h}_{J_{k}D_{i}}[n] = \sqrt {\frac {1}{K_{1}+1}}\bar {h}_{J_{k}D_{i}}[n] \tag {6b}\end{align*}
The values of channel gain values can be represented as follows:\begin{align*} h_{BD_{i}}[n]& =\sqrt {L_{0}d_{BD_{i}}^{-\eta }[n]} \sqrt {\frac {1}{K_{1}+1}} CN(0,1); \tag {7a}\\ h_{BU_{c}}[n]& =\sqrt {L_{0}d_{BU_{c}}^{-\alpha }[n]} \sqrt {\frac {K_{1}}{K_{1}+1}} \bar {h}_{BU_{c}}[n]; \tag {7b}\\ h_{U_{c}D_{i}}[n]& =\sqrt {L_{0}d_{U_{c}D_{i}}^{-\alpha }[n]} \sqrt {\frac {K_{1}}{K_{1}+1}}\bar {h}_{U_{c}D_{i}}[n]; \tag {7c}\\ h_{J_{k}D_{i}}[n]& =\sqrt {L_{0}d_{J_{k}D_{i}}^{-\eta }[n]} \sqrt {\frac {1}{K_{1}+1}} CN(0,1) \tag {7d}\\ h_{J_{k}U_{c}}[n]& =\sqrt {L_{0}d_{J_{k}U_{c}}^{-\alpha }[n]} \sqrt {\frac {K_{1}}{K_{1}+1}} \bar {h}_{J_{k}U_{c}}[n] \tag {7e}\end{align*}
2) Sum Rate
In the given channel model, the Sum Rate represents the total sum rate across all mobile devices within the cluster c associated with UAV-borne RIS \begin{equation*} R^{c}_{sum}[n] = B\sum _{i=1}^{I}\log _{2}(1+SINR^{c}_{i}[n]) \tag {8}\end{equation*}
The SINR of mobile device \begin{align*} SINR^{c}_{i} & = \left ({{\frac {P_{B} \left |{{h_{BD_{i}} + h_{BU_{c}} \Phi _{c} h_{U_{c}D_{i}} }}\right |^{2}} {\sum _{k=1}^{K} P_{J_{k}} |h_{J_{k}D_{i}} + h_{J_{k}R} \Phi _{c} h_{U_{c}D_{i}}|^{2} + \sigma ^{2}} }}\right) \\ & \quad \forall i \in [1,I], \forall c \in [1,C], \forall k \in [1,K] \tag {9}\end{align*}
We assume the communication bandwidth B to be \begin{align*} R^{c}_{sum}[n] & = B \sum _{i=1}^{I}\log _{2} \left ({{1 + }}\right. \\ & \left.{{\frac {P_{B} \left |{{h_{BD_{i}} + h_{BU_{c}} \Phi h_{U_{c}D_{i}} }}\right |^{2}} {\sum _{k=1}^{K} P_{J_{k}} |h_{J_{k}D} + h_{J_{k}R} \Phi h_{U_{c}D_{i}}|^{2} + \sigma ^{2}} }}\right) \tag {10}\end{align*}
3) Energy Consumption Model
The calculations for total energy consumption \begin{equation*} E_{U}= \sum _{c=1}^{C}(E^{c}_{RIS}+ E^{c}_{U}); \forall c \in [1,C] \tag {11}\end{equation*}
Communication energy of RIS phase shifting is given as:\begin{equation*} E^{c}_{RIS}= \sum _{n=1}^{N}\sum _{k=1}^{K} S_{k} [n] MP^{RIS_{c}} \tag {12}\end{equation*}
Given the negligible energy consumption of RIS (\begin{equation*} E_{U}= \sum _{c=1}^{C}E^{c}_{U}; \forall c \in [1,C] \tag {13}\end{equation*}
The energy consumed for propulsion by a single rotary wing UAV \begin{align*} E^{c}_{U} & = \sum _{n=1}^{N}\delta _{t} \left ({{ \underset {Blade \: Profile}{\underbrace {P_{0} \left ({{1 + \frac {3v_{c}[n]^{2}}{U_{tip}^{2}}}}\right)}} + \underset {Parasite}{\underbrace {c_{0} v_{c}[n]^{3}}} }}\right) \\ & \quad + \sum _{n=1}^{N}\delta _{t} \underset {Induced \: Power} {\underbrace {P_{1} \left ({{ \sqrt {\sqrt {1 + \frac {v_{c}[n]^{4}}{4v_{0}^{2}}} - \frac {v_{c}[n]^{2}}{2v_{0}^{2}}} }}\right) }} \tag {14}\end{align*}
In the given equation 14,
The UAV velocity \begin{equation*} v_{c}[n] = \sqrt {\frac {(dx_{c}[n])^{2} + (dy_{c}[n])^{2} + (dz_{c}[n])^{2}}{\delta _{t}}} \tag {15}\end{equation*}
The speed
4) RIS Passive Beamforming
A phase shift matrix represents the reflection coefficients of all reflecting units in an RIS mounted on UAV \begin{equation*} \Phi _{c}[n]=diag\{e^{j\theta _{1}^{c,n}}, e^{j\theta _{2}^{c,n}}, e^{j\theta _{3}^{c,n}}, e^{j\theta _{4}^{c,n}},\ldots, e^{j\theta _{M}^{c,n}}\} \tag {16}\end{equation*}
To address the computational complexity of phase shift design in large-scale RIS, we propose using a physical model that maps the RIS reflection coefficient to the beamforming direction, thus reducing passive beamforming calculations and synchronizing with real-time channel state changes. The reflection coefficients \begin{align*} & \theta _{m} = \\ & \text {mod} \left ({{-\frac {2\pi }{\lambda _{w}} \left ({{ \left ({{\sin {\theta _{t}}\cos {\varphi _{t}} + \sin {\theta _{r}} \cos {\varphi _{r}} }}\right)\left ({{m - \frac {1}{2}}}\right) d_{x} }}\right. }}\right. \\ & \quad + \left.{{ \left.{{ \left ({{\sin {\theta _{t}}\sin {\varphi _{t}} + \sin {\theta _{r}}\sin {\varphi _{r}}}}\right)\left ({{m - \frac {1}{2}}}\right) d_{y} }}\right), 2\pi }}\right) \tag {17}\end{align*}
B. Problem Formulation
Our primary goal is to maximize the sum rate for mobile devices and improve energy efficiency for swarm UAVs to counter multiple jammers. This involves strategically moving UAV-borne RIS platforms, dynamically adjusting RIS phase shifts, base station transmit power, and device-to-UAV associations, while docking UAVs that are inactive for a specified duration.
The second objective of energy consumption is achieved in two ways: first, by optimizing the trajectory and movements of the Swarm UAVs, and second, by minimizing the number of UAVs in flight.
The trajectories of the UAVs are adapted dynamically in all three dimensions during each time slot n, denoted as
Therefore, the problem is formulated as\begin{align*} & \pmb {P1:} \quad \max _{\omega _{c}, \Phi _{c}, \lambda ^{c}_{i}, P_{B} } \sum _{n=1}^{N} \sum _{c=1}^{C} \frac {\sum _{i=1}^{I}\lambda ^{c}_{i}[n] R_{i}^{c}[n]}{E_{U}^{c}[n]} \\ & \textrm {s.t.} \quad C_{1}:\omega _{c}[n] = \\ & \qquad \{x_{c}[n], y_{c}[n], z_{c}[n]\} \cdot 1_{\left ({{\sum _{i=1}^{I} \lambda ^{c}_{i}[n]\gt 0 \& \sum _{n=1}^{h+1} E_{U}^{c} \leq B_{L}}}\right)} \\ & \qquad \quad + \{x_{d}, y_{d}, 0\} \cdot 1_{\left ({{\sum _{i=1}^{I} \lambda ^{c}_{i}[n]=0 \parallel \sum _{n=1}^{h+1} E_{U}^{c} \gt B_{L} }}\right)}; \\ & \qquad \qquad \qquad \quad \forall c \in [1,C], \{n, h\} \in [1,N] \\ & \quad C_{2}: X_{min} \leq x_{c}[n] \leq X_{max}; \forall c \in [1,C] \\ & \quad C_{3}: Y_{min} \leq y_{c}[n] \leq Y_{max}; \forall c \in [1,C] \\ & \quad C_{4}: Z_{min} \leq z_{c}[n] \leq Z_{max}; \forall c \in [1,C] \\ & \quad C_{5}: \theta _{m}^{c}[n] \in [0,2\pi ], \forall m \in [1,M], c \in [1,C] \\ & \quad C_{6}: \sum _{c=1}^{C} \lambda ^{c}_{i}[n] = 1;\forall c \in [1,C] \\ & \quad C_{7}: \lambda ^{c}_{i}[n] \in \{0,1\}; \forall c \in [1,C] \\ & \quad C_{8}: \sum _{n=j+1}^{j+p} \underset {c \neq c'}{\sum _{c=1}^{C}} \sum _{i=1}^{I} \lambda ^{c}_{i}[n] = I \cdot p - \\ & \qquad \quad \left ({{\sum _{n=j+1}^{j+p} \sum _{i=1}^{I} \lambda ^{c'}_{i}[n]}}\right)\cdot 1_{\left ({{\sum _{i=1}^{I} \lambda ^{c'}_{i}\gt 0 \parallel \sum _{n=h+1}^{j} E_{U}^{c'} \leq B_{L}}}\right)} \\ & \qquad \qquad \qquad \forall \{c, c'\} \in [1,C], \{n, j, h, p\} \in [1,N] \\ & \quad C_{9}: \parallel \omega _{c}[n] - \omega _{c'}[n]\parallel ^{2} \geq d_{min}^{2} \cdot 1_{(\{ \omega _{c}, \omega _{c'} \} \neq \{x_{d}, y_{d}, 0\})} \\ & \qquad \qquad \qquad \forall \{c, c'\} \in [1,C] \\ & \quad C_{10}: 0 \leq P_{B}[n] \leq P^{max}_{B} \tag {18}\end{align*}
Constraint
C. RL for Adaptive Swarm Aerial-RIS Anti-Jamming
Our problem involves a complex mixed-integer non-convex multi-objective optimization challenge, with continuous variables like
Solving a non-convex optimization problem is challenging with incomplete information about jammer locations, device trajectories, and varying numbers of devices and jammers. To address these uncertainties, we propose an RL-equipped control unit at Base Station B to manage UAV movements, RIS phase shifts, device-to-UAV associations, and transmit power, allowing swift adaptation to unforeseen challenges without restarting the learning process.
The problem is formulated as a Markov Decision Process (MDP), detailed in subsection III-C3. To discover an optimal control policy to maximize the Sum Rate for mobile users and enhance UAV energy efficiency, we employ a model-free Deep Reinforcement Learning (DRL) technique called Proximal Policy Optimization (PPO). The PPO algorithm operates without requiring prior information about the UAV-borne-RIS locations and RIS phase shift coefficients.
Compared to Deep Q-Network (DQN) [42], Proximal Policy Optimization (PPO) is better suited for continuous control stochastic problems due to its policy-based approach [43]. While DQN excels with small, discrete action spaces, our problem involves a multi-parameter continuous action space, making PPO a more appropriate choice. Additionally, PPO has higher sample efficiency, requiring fewer training samples to converge, which is crucial when data collection is resource-intensive or time-consuming.
Considering policy-based algorithms, Deep Deterministic Policy Gradient (DDPG) [43], an earlier Actor-Critic variant, contrasts with Proximal Policy Optimization (PPO), which is better suited for high-dimensional action spaces and offers smoother learning and greater stability but it is computationally inefficient and slow. While Trust Region Policy Optimization (TRPO) [44] is effective in non-stationary environments, PPO is preferred for its lower computational cost, faster convergence, and performance in high-dimensional settings.
We disregard the computational energy of the RL algorithm, as it is deemed insignificant. The computational energy associated with RL algorithms like PPO is linked to the complexity of the DNN model used. Given the low complexity of the model, the impact of computational energy on overall energy consumption is negligible.
1) Proximal Policy Optimization (PPO)
PPO exhibits the capability to ease intricate constraints by substituting them with adaptable ones. The primary aim of the PPO algorithm can be articulated as:\begin{align*} J^{PPO} = E_{s,a} [min(\eta *A^{\pi _{old}}, clip(\eta, 1-\epsilon, 1+\epsilon)A^{\pi _{old}})] \tag {19}\end{align*}
\begin{equation*} \eta = \frac {\pi (a,s)}{\pi _{old}(a,s)} \tag {20}\end{equation*}
The advantage estimation function quantifies the superiority of a specific action in a given state compared to others [45] and is given as follows:\begin{equation*} A^{\pi } = Q(s_{t}, a_{t}) - \upsilon ^{\pi }(s) \tag {21}\end{equation*}
Here,
The PPO clip range,
Algorithm 1 PPO Algorithm Deployed at Base Station B
Initialization:
Initialize the locations for C Swarm Aerial-RIS
platforms
Randomly initialize the locations for K jammers spread
around in the vicinity of I mobile devices;
Initialize matrices
elements for each Aerial-RIS
PPO Training
Initialize random parameters
for each episode n from 1 to N do
for each iteration (step) i from 1 to I do
Obtain current State:
Select Action from Equation 22:
Calculate UAV-borne RIS location
displacements of UAV
Calculate RIS phase shift matrix
Equation 16 based on angles of RIS mounted
on UAV
Calculate Sum Rate of devices associated with
UAV
Calculate Energy Consumption
from Equation 14
Normalize the values of
Calculate Penalty from Eq. 26:
Observe Reward from Eq. 25:
Observe Next State:
Compute Advantage Estimates from Eq. 21,
Update
Update
Capture UAV location
Energy Consumption
iteration
end for
Update Aerial-RIS location
Update Device locations
Update RIS phase shift matrix
Update UAV
end for
2) Multi-Objective Optimization Problem
The challenge involves the optimization of two objectives simultaneously i.e., maximizing the Sum Rate
Given the single-objective optimization focus of PPO, we address our multifaceted problem using Scalarization, commonly linked with Pareto Optimization. This method expresses multiple objectives as weighted values to find non-dominant Pareto-optimal solutions, balancing trade-offs between objectives. Our approach involves multi-objective optimization, aiming to maximize the sum rate and minimize energy consumption by Swarm Aerial RIS platforms, employing PPO to uncover non-dominant solutions. In the subsequent section, we frame the multi-objective optimization within the Markov Decision Process (MDP) framework.
3) General MDP Formulation
The Markov Decision Process (MDP) formulation consists of state space, action space, and reward function to align with the mixed-integer continuous decision space of PPO algorithm, as shown in Figure 3. The objective involves balancing two goals: maximizing sum rate
System Model: Swarm UAV-RIS assisted 6G communications for dynamic and dense wireless environment with multiple jammers.
The environment comprises a base station B, multiple jammers
The state or observation space comprises (
The reward function encompasses sum rate
The state S, action A, and reward R are defined as follows:
State
The state
, is composed of$s \in S$ Selected mobile device
location:$D_{i}$ $\omega _{D_{i}}[n] = \{x_{D_{i}}[n], y_{D_{i}}[n], 0 \}\forall i \in [1,I]$ Sum Rate of mobile devices of clusters associated with
:$U_{c}$ at the mobile devices at previous time slot$R^{c}_{sum}[n-1]$ $(n-1)$ Locations of Swarm Aerial-RIS platforms
.$\{\omega _{1}[n], \omega _{2}[n],..,\omega _{C}[n]\}$ Battery levels of Swarm UAVs,
.$ \{ BL_{1}[n], BL_{2}[n],\ldots,, BL_{C}[n] \}$
Action
The actions space
for each iteration or tiem step$a \in A$ comprises of four sub-actions consisting of Device-to-UAV association parameter$i \in [1,I]$ , UAV-RIS trajectory$a^{n}_{\lambda _{i}}$ , RIS phase shift angles$a^{n}_{U_{i}}$ and base station transmit power$a^{n}_{R_{i}}$ at time slot n, denoted as$a^{n}_{BS}$ as expressed in Equation 22.$a[n] = \{a^{n}_{\lambda _{i}}, a^{n}_{U_{i}}, a^{n}_{R_{i}}, a^{n}_{P_{B}} \}$ \begin{align*} a_{\lambda _{i}}^{n} & = \{ \lambda _{i}[n] \} \\ a_{U_{i}}^{n} & = \{D^{i}_{x}[n], D^{i}_{y}[n], D^{i}_{z}[n]\} \\ a_{R_{i}}^{n} & = \{\theta ^{i}_{t}[n], \varphi ^{i}_{t}[n], \theta ^{i}_{r}[n], \varphi ^{i}_{r}[n] \} \\ a_{P_{B}}^{n} & = \{ P_{B}[n]\} \tag {22}\end{align*} View Source\begin{align*} a_{\lambda _{i}}^{n} & = \{ \lambda _{i}[n] \} \\ a_{U_{i}}^{n} & = \{D^{i}_{x}[n], D^{i}_{y}[n], D^{i}_{z}[n]\} \\ a_{R_{i}}^{n} & = \{\theta ^{i}_{t}[n], \varphi ^{i}_{t}[n], \theta ^{i}_{r}[n], \varphi ^{i}_{r}[n] \} \\ a_{P_{B}}^{n} & = \{ P_{B}[n]\} \tag {22}\end{align*}
The trajectories of Swarm Aerial-RIS platforms
are given as three continuous parameters,$U_{i}$ ,$D^{i}_{x}$ , and$D^{i}_{y}$ , representing UAV displacements along the x, y, and z axes within the rectangular coordinate system centered at the Base Station B. The beamforming direction of RIS elements is adjusted through phase shifts$D^{i}_{z}$ . Nevertheless, as the number of RIS reflecting units M increases, the action space for RIS phase shift values$\theta _{m}$ also becomes extensive. To address this challenge, beamforming directions for each of the elements of RIS are calculated indirectly through$\theta _{m}^{i}$ ,$\theta _{t}^{i}$ ,$\varphi _{t}^{i}$ , and$\theta _{r}^{i}$ , representing the angles between the incident signal and the x axis, incident signal and the z axis, reflection signal and the x axis, and reflection signal and the z axis, respectively, as illustrated in Figure 1. As per [41], the corresponding value$\varphi _{r}^{i}$ can be readily derived from the Equation 23, as shown at the bottom of the next page,$\theta _{m}^{i}$ by determining the beamforming direction, where the value element m is extracted from its respective row and column.\begin{equation*} \theta _{col, row}^{i} = mod \left.{{ \left ({{-\frac {2\pi }{\lambda _{w}} \left.{{\left ({{ \left ({{\sin {\theta _{t}^{i}}\cos {\varphi _{t}^{i}} + \sin {\theta _{r}^{i}} \cos {\varphi _{r}^{i}} }}\right) \left ({{row - \frac {1}{2}}}\right) d_{x}^{i} }}\right. }}\right.+ \left.{{ \left.{{ \left ({{\sin {\theta _{t}^{i}}\sin {\varphi _{t}^{i}} + \sin {\theta _{r}^{i}}\sin {\varphi _{r}^{i}}}}\right) \quad \left ({{col - \frac {1}{2}}}\right) d_{y}^{i} }}\right.}}\right), 2\pi }}\right. }}\right) \tag {23}\end{equation*} View Source\begin{equation*} \theta _{col, row}^{i} = mod \left.{{ \left ({{-\frac {2\pi }{\lambda _{w}} \left.{{\left ({{ \left ({{\sin {\theta _{t}^{i}}\cos {\varphi _{t}^{i}} + \sin {\theta _{r}^{i}} \cos {\varphi _{r}^{i}} }}\right) \left ({{row - \frac {1}{2}}}\right) d_{x}^{i} }}\right. }}\right.+ \left.{{ \left.{{ \left ({{\sin {\theta _{t}^{i}}\sin {\varphi _{t}^{i}} + \sin {\theta _{r}^{i}}\sin {\varphi _{r}^{i}}}}\right) \quad \left ({{col - \frac {1}{2}}}\right) d_{y}^{i} }}\right.}}\right), 2\pi }}\right. }}\right) \tag {23}\end{equation*}
Here,
and$d_{x}^{i}$ represent the length and width of each unit cell of the RIS. These dimensions, range between$d_{y}^{i}$ and$\frac {\lambda _{w}}{10}$ , within the sub-wavelength scale, where$\frac {\lambda _{w}}{2}$ denotes the wavelength of the signal.$\lambda _{w}$ Reward
The evaluations of the action is given by Reward
. It is the function of the sum rate at the mobile devices$r \in R_{w}$ and energy consumed by UAV-borne RIS plarforms$D_{i} \forall i \in [1,I]$ as follows:$E_{U_{c}} \forall c \in [1,C]$ \begin{align*} r[n] = \sum _{c=1}^{C} \frac {R_{sum}^{c}[n]}{E_{U_{c}}[n]} = \sum _{c=1}^{C} \frac {\log _{2}\left ({{1 + SINR_{i}^{c}[n] }}\right)}{E_{U_{c}}[n]} \tag {24a}\end{align*} View Source\begin{align*} r[n] = \sum _{c=1}^{C} \frac {R_{sum}^{c}[n]}{E_{U_{c}}[n]} = \sum _{c=1}^{C} \frac {\log _{2}\left ({{1 + SINR_{i}^{c}[n] }}\right)}{E_{U_{c}}[n]} \tag {24a}\end{align*}
The reward r encapsulates various objectives, including the sum rate R and the energy consumption by Swarm Aerial RIS platforms,
, thus fitting into Multi-Objective Reinforcement Learning. Scalarization is required, involving normalizing objectives and representing them as an additive reward function akin to a weighted sum of normalized objective values. We draw guidance from a similar issue addressed in [46] to shape our rewards. The scalarized reward function is formulated as follows:$E_{U_{c}}$ \begin{equation*} r= (w)R_{sum}^{norm} - (1 - w) E_{U_{c}}^{norm} - p \tag {25}\end{equation*} View Source\begin{equation*} r= (w)R_{sum}^{norm} - (1 - w) E_{U_{c}}^{norm} - p \tag {25}\end{equation*}
We iteratively tune the weight w in equation 25 to find the optimal balance between sum rate and energy consumption of Swarm Aerial RIS platforms, yielding a non-dominant solution. Constraints on UAV movements, battery levels, inter-UAV collisions, and device-UAV associations at the charging dock introduce a penalty p in the reward function, represented as a linear combination of penalties for violating these constraints in equation 26:
\begin{equation*} p = p_{loc} + p_{batt} + p_{coll} + p_{R_{min}} + p_{dock} \tag {26}\end{equation*} View Source\begin{equation*} p = p_{loc} + p_{batt} + p_{coll} + p_{R_{min}} + p_{dock} \tag {26}\end{equation*}
These penalties are assessed for breaching constraints in the problem formulation. The location boundary penalty (
) is imposed when a UAV tries to cross the wireless cell boundary. The battery expiration penalty ($p_{loc}$ ) occurs when a UAV battery usage exceeds the limit. The inter-UAV collision penalty ($p_{batt}$ ) applies when two UAVs collide during flight, but not at rest on the docking station. The minimum data rate penalty ($p_{coll}$ ) is imposed when any device gets a data rate less than the threshold of 5 Mbps. Lastly, the docking penalty ($p_{R_{min}}$ ) is charged when the RL algorithm assigns a device to a recharging UAV.$p_{dock}$
The pseudo-code of the proposed PPO-based algorithm is given in Algorithm 1. The algorithm is designed to run for N episodes, and each episode
System Evaluation
A. Simulation Setup
Utilizing the PPO library from Stable Baselines package based on OpenAI baseline libraries, we configure the simulation environment to train the PPO model within a customized wireless communications context. This section elaborates on the environment, encompassing its parameters and hyperparameters. Given the non-convex and computationally complex nature of our problem, the PPO algorithm emerges as a valuable tool for its resolution. PPO is an actor-critic-based, lightweight tool using a Deep Neural Network (DNN). Both actor and critic use fully connected neural networks with two hidden layers, with 32 neurons in each layer. In two hidden layers, ReLU activation functions are used, while linear activation function is used for the output layer of the critic network. The output layer for the actor-network uses tanh as an activation function to scale output to
The Base Station B, Swarm Aerial-RIS platforms
The mobility of the Swarm Aerial RIS platforms
B. Simulation Experiments
The simulations aim to find an optimal solution through multi-objective optimization.
Figure 5 shows the blue box as the boundary constraint for the mobility of mobile devices. In contrast, the red box illustrates the spatial limitations for the movements of Swarm Aerial-RIS platforms, guaranteeing their confinement within the designated 6G cellular region.
Swarm Aerial-RIS platforms are limited to a maximum displacement of 46 meters, while mobile devices can move up to 20 meters per episode. Jammers are distributed within a 120 meter radius around the initial mobile device locations. The system restricts spatial movements, UAV-to-Device associations during UAV recharging, UAV battery levels, minimum data rates for mobile devices, and inter-UAV collisions. The simulation covers approximately
During PPO agent training, numerous simulations showed convergence for cumulative reward, sum rate, and energy consumption. The optimal scalarization weight w, identified as the cliff point, is set to 0.5. This point represents a balanced, non-dominant solution that maximizes the sum rate while minimizing energy consumption.
We initialize the PPO algorithm with specific hyperparameters: a reward discount factor of
In the following subsections, we conduct simulations to evaluate the efficiency of the proposed system model, comparing it with related works and baseline scenarios while examining the effects of different parameters and configurations.
Fixed Multiple RIS Installations: Comparison to a scenario presented in a related work [7]. The authors present a multi-cluster wireless-powered Internet of Things (WP-IoT) network assisted by multiple RIS installed on stationary objects like buildings to maximize the sum throughput under fewer constraints.
Fixed Device-to-UAV Association: Comparison of the system model proposed with another related work [12] where device clusters association with the UAV-mounted RIS platforms is fixed.
Random Device-to-UAV Association: We conduct a comparative evaluation of the proposed system model with a baseline scenario where the device-to-UAV associated is randomized instead of controlled by the RL agent.
Single Objective Optimization for Sum Rate Maximization,
: We evaluate the system performance for sum rate optimization only, with the value of scalarization weight w, set to 1. The scenario is similar to those presented in related works [11] and [12] where the authors of these two works utilize multiple UAV-mounted RIS platforms to serve multiple mobile devices to achieve only single objectives.$w=1$ Random RIS Phase Shifting: Comparing the performance of the proposed system model with baseline with Random RIS phase shifting.
Simulations of the aforementioned scenarios are carried out to compare and verify the performance of our system model.
1) Impact of Variation in the Jammers Distribution
We begin by investigating the impact of varying the distribution areas of jammers experimentally, comparing the performance and adaptability of the swarm UAV-borne RIS system with different configurations in related works and the baselines to gain useful insights. The number of devices is set to
We observe the correlation between jammer proximity to mobile devices and the sum rate, particularly within 120 to 720 meters, where jamming negatively impacts received signals. As the jammer distribution increases with expanding cell size, the sum rate also decreases due to greater distances from the base station despite an even jammer distribution. However, Figure 6 suggests that effective optimization is still achievable even in scenarios with jammers nearby.
Results for evaluating the impact of varying the jammers distribution on the proposed Swarm UAV-borne RIS scenario (a) Cumulative Reward (b) Sum Rate (c) Energy Consumption (d) Base Station Transmit Power. Adaptive UAV Swarm Formation in (e) Proposed solution and (f) Alternate Scenarios. Comparative bar plots for (g) Average Sum Rate and (h) Average Energy Consumption.
The proposed system model achieves convergence for parameters like a cumulative reward, sum rate, energy consumption, and base station transmission power as shown in subplots 6(a) to 6(d).
However, the energy consumption trend shown in subplot 6(c) shows an increase due to the higher mobility of UAVs to counter the greater challenge of jammers spread in the cellular region. Moreover, the base station power increases with the increase in the jammer distribution region shown in subplot 6(d) to overcome the increased coverage challenge for the UAVs. This shows that UAV-mounted RIS platforms can successfully overcome the increased challenge of jammer distribution.
The subplot 6(e) shows dynamic swarm formation with the increase in UAV participation activity as jammer distribution increases along with the cell size. This is triggered by the slight decline in the sum rate due to the increased distance of devices from the Base Station and UAV-mounted RIS relays.
As depicted in subplot 6(f), the adaptive UAV activity trends show significant increases with more jammers in the 6G micro-cell. Both our proposed solution and the Random RIS Phase Shifts display this trend, unlike the baseline scenarios (Random and Fixed Device-to-UAV Associations), which show higher UAV activity with smaller jammer distributions but lower sum rates and higher energy consumption due to sub-optimal UAV-device associations. In Single Objective Optimization, UAV activity increases similarly, though it results in slightly higher energy consumption and lower data rates in scenarios with fewer jammers, emphasizing the need for balanced objective considerations.
The performance of our proposed swarm Aerial-RIS solution and other baselines remains consistent across different jammer distribution radii ranging from 120 to 720 meters, as illustrated in Figures 6(g) and 6(h). Although the Fixed Multi-RIS Installation sometimes gives a better sum rate with nearly zero energy consumption, this comes randomly without adapting to the dynamic nature of the wireless environment. Our solution outperforms the Fixed and Random Device-to-UAV Association baselines introduced in related works [12], along with other configurations, in terms of average sum rate and energy consumption. This is due to RL agent optimizing the swarm UAV formation and dynamic clustering through device-to-UAV association, ensuring that a minimum number of UAVs are deployed to achieve the maximum sum rate, while the other cases do not optimize clustering. While configurations like Single Objective Optimization achieve higher data rates, they do so at the expense of slightly increased energy consumption, especially at larger jammer distributions, as they do not optimize this aspect. The Random RIS Phase Shifts configuration shows similar trends, with very low sum rates and higher energy consumption compared to our proposed solution.
We also demonstrate in Figure 7 that our proposed system model maximizes the sum rate and minimizes energy consumption while satisfying the combined respect levels for all the system constraints, over 75% for most cases. This shows that the proposed RL-based solution adapts well to varying spatial circumstances in wireless cellular environments, allowing flexibility and adaptability while fully respecting all system constraints. Respect for constraints is also maintained in all the subsequent cases discussed in this section.
Comparative evaluation of the impact of varying the jammers distribution areas on the respect for multiple constraints in Swarm UAV-borne RIS scenario.
2) Impact of Variation in the Number of Jammers
Another set of evaluations examines the costs of jamming within our proposed system model. Figure 8 shows the impact of varying the number of jammers
Results for evaluating the impact of varying number of jammers on the proposed Swarm UAV-borne RIS scenario (a) Cumulative Reward (b) Sum Rate (c) Energy Consumption (d) Base Station Transmit Power. Adaptive UAV Swarm Formation in (e) Proposed solution and (f) Alternate Scenarios. Comparative bar plots for (g) Average Sum Rate (h) Average Energy Consumption.
Conversely, the energy consumption shows slight fluctuations across different quantities of jammers from
The adaptability of the RL agent is a key highlight in our proposed system model which is demonstrated when compared to the baselines presenting Fixed Device-to-UAV Association and Random Device-to-UAV Association, which show the maximum engagement of at least four UAVs, evident in sub-plots 8(f). All related works and baseline configurations show increased adaptive UAV Swarm formation with an increase in the number of jammers. However, these configurations are inefficient as they achieve lower sum rates with higher energy consumption due to lack of optimization of swarm formation and clustering.
The comparative bar plots of two scenarios from the related works [7], [12], and different baselines are shown in Figures 8(g) and 8(h). RL-based optimization with swarm UAV-mounted RIS platforms leads to superior performance compared to the baselines and related works, except for one specific scenario such as Single Objective Optimization. The proposed solution demonstrates partial superiority with the higher sum rate with lower energy consumption at jammer quantity from
However, beyond
3) Impact of Variation in the Number of Mobile Devices
Finally, we study the impact of varying the number of mobile devices through simulations comparing our approach to related works and baselines while keeping the number of UAVs in the swarm as
Figure 9 illustrates the convergence of cumulative reward, sum rate, energy consumption, and base station transmit power. As the number of devices increases from
Results for evaluating the impact of varying the number of mobile devices on the proposed Swarm UAV-borne RIS scenario (a) Cumulative Reward (b) Sum Rate (c) Energy Consumption (d) Base Station Transmit Power. Adaptive UAV Swarm Formation in (e) Proposed solution and (f) Alternate Scenarios. Comparative bar plots for (g) Average Sum Rate and (h) Average Energy Consumption.
In our simulations, we compare UAV activity participation in the swarm, as depicted in subplot 9(e), against a baseline scenario of Random Device-to-UAV Association and related work on Fixed Device-to-UAV Association [12], along with other configurations in subplot 9(f). The proposed approach shows increased UAV activity with more devices, indicating a higher sum rate and coverage needs. In contrast, Fixed and Random Device-to-UAV Association scenarios exhibit greater UAV involvement even with fewer devices. At the same time, other configurations show a gradual increase in UAV activity as device numbers rise.
Our swarm UAV-borne RIS system model also performs better than other baselines and configurations, as shown in subplots 9(g) and 9(h). The proposed solution adequately covers more devices, such as
Our proposed solution also outperforms the scenarios with Random and Fixed Device-to-UAV Association [12] due to increased mobility from UAV-mounted RIS platforms, enabling effective spatial exploration and adaptability to environmental changes for maximizing sum rates while minimizing energy consumption. However, in Random and Fixed Device-to-UAV association scenarios, despite higher UAV activity, we observe lower sum rates with very high energy consumption as shown in subplot 9(h).
We observed that the sum rate for the Single Objective Optimization configuration slightly outperforms the proposed model at higher device quantities
4) Real-Time Adaptive UAV Swarm Formation
We also demonstrate the proposed real-time adaptation solution to rapidly changing wireless environment parameters during the operations using UAV swarm formation and clustering. We observe that the RL agent adapts very well when the jammer distribution radius is switched from
Plots showing UAV Swarm Formation adapting to the rapid changes in jammer distribution area switching from
Conclusion
This study proposes a robust strategy against jamming attacks in dense urban wireless networks using Deep Reinforcement Learning (DRL). The Proximal Policy Optimization (PPO) algorithm optimizes UAV trajectories, swarm formation, clustering, RIS phase shifts, and base station transmission power to maximize sum rates while minimizing the UAV energy consumption in the presence of unknown jammers. This approach effectively addresses the complexities of dense urban environments, ensuring optimal outcomes across various scenarios.
Future research aims to delve into multi-agent DRL methodologies within complex wireless communication frameworks. This may encompass elements such as mobile jamming entities and establishing collaborative Swarm UAVs comprising RIS-supported airborne platforms, thus introducing further complexities to the optimization problem.
ACKNOWLEDGMENT
Open Access funding is provided by Qatar National Library.