I. Introduction
Traffic congestions and exhaust emissions have become one of the sources restricting the development of smart cities. Numerous adaptive traffic signal control (ATSC) algorithms have been proposed to optimize traffic efficiency and traffic safety by adjusting signal timing scheme [1], [2]. Multi-agent deep reinforcement learning (MADRL) [3], [4] is a promising method to solve the complicated multi-intersection ATSC problem, which incorporates the decision-making capacity of reinforcement learning, the perception capacity of deep learning and the coordination capacity of multi-agent. Even so, most of the existing MADRL for ATSC focus on traffic efficiency improvement while neglecting the serious effect of carbon emissions to the environment [5], [6], [7]. The driving behaviors of abrupt acceleration and deceleration and the irrational timing strategies of traffic signals are the main causes of increasing energy consumption and exhaust emissions. Some experts and scholars have focused on optimizing ecological driving behaviors, which attempts to reduce fuel consumption and emissions by smoothing driving speed curves of vehicles approaching intersections [8], [9]. However, they overlook the impacts of signal timing approaches on vehicle energy consumption and carbon emissions. Different from previous studies, MADRL will be utilized in this study to minimize carbon emissions and traffic congestions through cooperative control of timing strategies at distributed multiple signalized intersections.