I. Introduction
With the rapid development of intelligence and network technology in the military field, unmanned aerial vehicles (UAVs) are playing an increasingly important role in reconnaissance, surveillance, security patrols, and target monitoring [1], [2], [3]. In the commercial field, UAVs are also widely used in traffic analysis, logistics and transportation, resource exploration, and many other applications. In recent years, as the research on the autonomous capabilities of UAVs has progressed rapidly, researchers have begun to consider increasing the scope of UAV autonomy [4], [5], [6], which has created many new application areas including UAV surveillance, reconnaissance, tracking flight, convoy protection, and precision guidance of drones. For the application of drones in these situations, it is often inseparable from a very imperative technology, that is, UAV target tracking control technology. In a UAV standoff tracking control mission, a controller is designed to be able to allow the vehicle to approach a target that is moving or static and ensure the UAV maintains a certain distance from the target. UAV target tracking control technology can provide the vehicles with intelligent consciousness. This greatly enhances the autonomous capability of the UAV, enabling it to complete more kinds of tasks and adapt to more complex working conditions. This paper will focus on a tracking control system for vertical take-off and landing (VTOL) aircraft through reinforcement learning methods.