Loading [MathJax]/extensions/MathMenu.js
Transformer-Based Reinforcement Learning for Scalable Multi-UAV Area Coverage | IEEE Journals & Magazine | IEEE Xplore

Transformer-Based Reinforcement Learning for Scalable Multi-UAV Area Coverage


Abstract:

Compared with terrestrial networks, unmanned aerial vehicles (UAVs) have the characteristics of flexible deployment and strong adaptability, which are an important supple...Show More

Abstract:

Compared with terrestrial networks, unmanned aerial vehicles (UAVs) have the characteristics of flexible deployment and strong adaptability, which are an important supplement to intelligent transportation systems (ITS). In this paper, we focus on the multi-UAV network area coverage problem (ACP) which require intelligent UAVs long-term trajectory decisions in the complex and scalable network environment. Multi-agent deep reinforcement learning (DRL) has recently emerged as an effective tool for solving long-term decisions problems. However, since the input dimension of multi-layer perceptron (MLP)-based deep neural network (DNN) is fixed, it is difficult for standard DNN to adapt to a variable number of UAVs and network users. Therefore, we combine Transformer with DRL to meet the scalability of the network and propose a Transformer-based deep multi-agent reinforcement learning (T-MARL) algorithm. Transformer can adapt to variable input dimensions and extract important information from complex network states by attention module. In our research, we find that random initialization of Transformer may cause DRL training failure, so we propose a baseline-assisted pre-training scheme. This scheme can quickly provide an initial policy model for UAVs based on imitation learning, and use the temporal-difference(1) algorithm to initialize policy evaluation network. Finally, based on parameter sharing, T-MARL is applicable to any standard DRL algorithm and supports expansion on networks of different sizes. Experimental results show that T-MARL can make UAVs have cooperative behaviors and perform outstandingly on ACP.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 25, Issue: 8, August 2024)
Page(s): 10062 - 10077
Date of Publication: 07 February 2024

ISSN Information:

Funding Agency:


I. Introduction

Unmanned aerial vehicles (UAVs) are recognized as an important part of 6G network and intelligent transportation systems (ITS) to support a wider range and more diverse services [1], [2], [3], [4], [5]. UAVs can be flexibly deployed at high altitudes to reduce terrain interference, and are suitable for missions in areas lacking infrastructure and as a supplement to existing networks. Unfortunately, due to the complex environment and the lacking terrestrial networks, and UAVs with limited coverage capabilities for large-scale monitoring, how to optimize the flight trajectory of UAVs has become an urgent problem to be solved.

Contact IEEE to Subscribe

References

References is not available for this document.