I. Introduction
Reinforcement learning [39] has emerged as an effective approach in the development of autonomous systems, offering promising avenues for enhancing control mechanisms in various domains, such as autonomous driving [41], robotic manipulation [19], and quadrotor flight control [21]. In addition, the integration of RL in the control of the fixed-wing aircraft represents a significant stride toward adaptive and autonomous flight operations. Historically, fixed-wing aircraft control systems have heavily relied on pre-programmed algorithms, remote control, and traditional control methods (e.g. Proportional-Integral-Derivative (PID) control) [2]. However, the landscape is rapidly evolving with the advent of artificial intelligence and machine learning. As a subset of machine learning, RL offers a distinct advantage in this sphere by enabling systems to learn from their experiences and interactions within dynamic environments. Autonomous flight control demands continuous decision-making in response to real-time changes in flight, such as task reallocation and wind disturbance. The application of RL in this context aims to imbue aircraft control systems with the ability to adapt, learn, and optimize their decision-making processes from the feedback of the environment. The fundamental principle of RL involves the interaction of an agent, in this case, the aircraft control system, with an environment. Through this interaction, the system learns to perform sequences of actions that result in maximizing a cumulative reward. In the context of fixed-wing aircraft control, this learning process could involve attitude control [3], autopilot control [11], managing control surfaces [27], or even responding to unexpected situations without direct human intervention [34]. The potential benefits of applying RL in aircraft control are manifold. The adaptive nature of RL allows systems to learn and adjust their behavior in response to novel situations or changes in the environment. For example, the environment in which the aircraft operates is often uncertain, such as changes in air density, wind direction, and intensity. Additionally, the parameters of aircraft systems may change over time or be affected by external factors. For another example, in a tense flight environment, such as pursuit and escape, flight instructions are often switched frequently. This adaptability is particularly crucial to making swift and accurate decision-making to address the above unforeseen conditions. However, the integration of RL in aircraft control is challenging. Safety, reliability, interpretability of models, and adherence to stringent aviation regulations pose significant hurdles. Addressing these challenges is imperative for the successful adoption of RL in ensuring the safety and efficiency of the fixed-wing aircraft. In light of these challenges and the remarkable potential, the implementation of RL in controlling fixed-wing aircraft has infinite possibilities. The ongoing scientific research in this field hold the promise of transforming how aircraft control systems operate, leading to safer, more adaptive, and more efficient flight operations.