I. Introduction
Unmanned Aerial Vehicles (UAVs)-assisted communication is expected to play an important role in future wireless communications. There are many challenges that need to be addressed before UAVs can be effectively utilized for communication purposes [1]. The existing contributions on UAV-assisted communication systems can be divided into two categories: 1) UAV is statically deployed in the air to enhance wireless communication coverage; 2) UAVs are dynamically deployed, serving as the relay of the Base Station (BS) [2], and collecting Internet of Things (IoT) data, etc. Compared with the traditional ground BS, UAV-assisted-BS offeres the advantages such as enhanced mobility and increased likelihood of Line-of-Sight (LoS) communication with users. The existing contributions on dynamic deployment of UAV are normally based on: a) convex optimization algorithms [3]; and b) deep reinforcement learning (DRL) algorithms [4]. Compared with traditional convex optimization, DRL based algorithms may have higher performance and lower time complexity [5].