I. Introduction
Road traffic congestion is a global issue that results in environmental degradation, economic losses, and diminished quality of life for commuters. According to the Global Traffic Scorecard developed by INRIX, traffic congestion costs an average American around 99 hours per year and around per year in lost economic activities based on 2019 data [1], not to mention the environmental impacts caused by excessive emissions. One of the most effective ways to reduce traffic congestion is through optimal traffic signal control (TSC) [2], [3], [4], [5], [6]. By regulating traffic light timing, TSC can ease congestion and improve traffic flow. Various approaches for optimal traffic signal control have been proposed, such as the Webster model [7] and the HCM delay model [8]. However, these methods often rely heavily on simplified traffic models or make unrealistic assumptions of a uniform arrival rate of vehicles. Existing TSC systems such as SCOOT [9] and SCATS [10] also face challenges in managing complex traffic scenarios.