I. Introduction
In the UAV-aided wireless communications scenario, UAVs need to continuously adjust their deployment locations and plan their flight trajectories according to mission requirements, which is called the UAV trajectory planning problem. In recent years, with the rapid development of machine learning theory, the application of artificial intelligence (AI) algorithms can be applied without directly solving the complex problem, but in the process of interacting with the environment to continuously adjust and gradually approach the optimal strategy, and finally obtain a solution to meet the performance requirements. This method provides the theoretical basis and possibility to solve complex communication problems. RL is an important branch of machine learning algorithms for solving a series of sequential decision making tasks, and Mnih [1] published a paper formally introducing the concept of DQN in 2015, which combines RL with neural networks. Since then DRL has entered the public eye.