I. Introduction
In recent decades, with the continuous acceleration of urban-ization and the sharp increase in the number of vehicles, traffic congestion has become an increasingly prominent issue and a common challenge faced by cities worldwide. Controlling traf-fic signals is an effective method to improve road efficiency, al-leviate traffic congestion, and reduce environmental pollution. Early ATSC algorithms such as SCOOT [1] and SCATS [2] relied on manually designed plans. However, in the face of the increasingly complex and dynamic traffic environment, these empirical observation-based methods are no longer favored. In recent years, the rapid advancement of artificial intelligence has significantly propelled the development of ATSC. ATSC has incorporated various interdisciplinary technologies, such as fuzzy logic algorithms [3], genetic algorithms [4], and the most widely used RL [5]. RL is a trial-and-error learning approach that models the ATSC problem as a markov decision process, selecting actions based on state information collected from the environment to maximize rewards.