Loading [MathJax]/extensions/MathMenu.js
Multi-Agent Constrained Policy Optimization for Conflict-Free Management of Connected Autonomous Vehicles at Unsignalized Intersections | IEEE Journals & Magazine | IEEE Xplore

Multi-Agent Constrained Policy Optimization for Conflict-Free Management of Connected Autonomous Vehicles at Unsignalized Intersections


Abstract:

Autonomous Intersection Management (AIM) systems present a new paradigm for conflict-free cooperation of connected autonomous vehicles (CAVs) at road intersections, the a...Show More

Abstract:

Autonomous Intersection Management (AIM) systems present a new paradigm for conflict-free cooperation of connected autonomous vehicles (CAVs) at road intersections, the aim of which is to eliminate collisions and improve the traffic efficiency and ride comfort. Given the challenges of current centralized coordination methods in balancing high computational efficiency and robust safety assurance, this paper proposes an innovative conflict-free management scheme for CAVs at unsignalized intersections, leveraging safe multi-agent deep reinforcement learning (MADRL). Firstly, we formulate the safe MADRL problem as a constrained Markov game (CMG) and then transform the AIM problem into a CMG by carefully designing state, action, reward, and cost functions. Subsequently, we propose the Multi-Agent Constrained Policy Optimization (MACPO), specifically tailored to solve the CMG problem. MACPO incorporates safety constraints that further restrict the trust region formed by the Kullback-Leibler (KL) divergence, facilitating reinforcement learning policy updates that maximize performance while keeping constraint costs within their limit bounds. This leads us to introduce the MACPO-based AIM Algorithm. Finally, we train an AIM policy and compare its computation time, ride comfort, traffic efficiency, and safety with management schemes based on Model Predictive Control (MPC), Mixed Integer Programming (MIP), and non-safety-aware reinforcement learning. According to the results, compared with the MPC and MIP methods, our method has increased computational efficiency by 65.22 times and 731.52 times respectively, and has improved traffic efficiency by 2.41 times and 1.80 times respectively. In contrast to the non-safety awareness RL methods, our method achieves a zero collision rate for the first time, while also enhancing ride comfort, highlighting the advantages of using MACPO.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 25, Issue: 6, June 2024)
Page(s): 5374 - 5388
Date of Publication: 20 November 2023

ISSN Information:

Funding Agency:


I. Introduction

With the significant enhancement of autonomous driving and internet of vehicles technology, vehicle-infrastructure collaboration has become a promising traffic management solution to provide safe, effective and comfortable transportation experience [1], [2]. In recent years, various vehicle-road collaborative applications have emerged successively [3], [4], [5]. As the particularly risky areas in urban environments, road intersections have drawn extensive attentions in dealing with serious traffic accident and severe congestion. Autonomous Intersection Management (AIM) systems are aimed to efficiently manage multi-connected autonomous vehicles (CAVs) at intersections, eliminate collisions, and optimize overall traffic efficiency as well as ride comfort [6]. Traditionally, these AIMs handle potential conflicts based on control strategies such as rule-based, optimization-based or machine learning-based methods to prevent anticipated conflicts from occurring [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35].

Contact IEEE to Subscribe

References

References is not available for this document.