I. Introduction
In recent years, the substantial benefits of low cost, extensive coverage, and adaptable deployment [1] have positioned UAV base stations (UAV-BSs) as pivotal components in emergency rescue scenarios. As such, research has increasingly concentrated on multiple UAV-BS deployments to facilitate dynamic coverage of the target area and augment the data rate of the communication system. For instance, cooperative trajectory design for multiple UAVs is conducted to maximize throughput while upholding service fairness. In [2], [3], and [4], UAV trajectory and resource association are jointly optimized. To investigate the optimal cooperative policy for multiple UAVs, certain studies have employed Multi-Agent Deep Reinforcement Learning (MADRL) algorithms [5], [6].