Loading [MathJax]/extensions/MathMenu.js
Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control | IEEE Journals & Magazine | IEEE Xplore

Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control


Abstract:

Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks fur...Show More

Abstract:

Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, the centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. The multi-agent RL (MARL) overcomes the scalability issue by distributing the global control to each local RL agent, but it introduces new challenges: now, the environment becomes partially observable from the viewpoint of each local agent due to limited communication among agents. Most existing studies in MARL focus on designing efficient communication and coordination among traditional Q-learning agents. This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent, advantage actor critic (A2C), within the context of ATSC. In particular, two methods are proposed to stabilize the learning procedure, by improving the observability and reducing the learning difficulty of each local agent. The proposed multi-agent A2C is compared against independent A2C and independent Q-learning algorithms, in both a large synthetic traffic grid and a large real-world traffic network of Monaco city, under simulated peak-hour traffic dynamics. The results demonstrate its optimality, robustness, and sample efficiency over the other state-of-the-art decentralized MARL algorithms.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 21, Issue: 3, March 2020)
Page(s): 1086 - 1095
Date of Publication: 15 March 2019

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

As a consequence of population growth and urbanization, the transportation demand is steadily rising in the metropolises worldwide. The extensive routine traffic volumes bring pressures to existing urban traffic infrastructure, resulting in everyday traffic congestions. Adaptive traffic signal control (ATSC) aims for reducing potential congestions in saturated road networks, by adjusting the signal timing according to real-time traffic dynamics. Early-stage ATSC methods solve optimization problems to find efficient coordination and control policies. Successful products, such as SCOOT [1] and SCATS [2], have been installed in hundreds of cities across the world. OPAC [3] and PRODYN [4] are similar methods, but their relatively complex computation makes the implementation less popular. Since the 90s, various interdisciplinary techniques have been applied to ATSC, such as fuzzy logic [5], genetic algorithm [6], and immune network algorithm [7].

Select All
1.
P. B. Hunt, D. I. Robertson, R. D. Bretherton and M. C. Royle, "The SCOOT on-line traffic signal optimisation technique", Traffic Eng. Control, vol. 23, no. 4, pp. 190-192, Apr. 1982.
2.
J. Y. K. Luk, "Two traffic-responsive area traffic control methods: SCAT and SCOOT", Traffic Eng. Control, vol. 25, no. 1, pp. 14, 1984.
3.
N. H. Gartner, "Demand-responsive decentralized urban traffic control Part I: Single intersection policies", 1982.
4.
J. J. Henry, J. L. Farges and J. Tuffal, "The PRODYN real time traffic algorithm", Proc. the 4th IFAC/IFIP/IFORS Conf., pp. 305-310, Jan. 1984.
5.
B. P. Gokulan and D. Srinivasan, "Distributed geometric fuzzy multiagent urban traffic signal control", IEEE Trans. Intell. Transp. Syst., vol. 11, no. 3, pp. 714-727, Sep. 2010.
6.
H. Ceylan and M. G. Bell, "Traffic signal timing optimisation based on genetic algorithm approach including drivers’ routing", Transp. Res. B Methodol., vol. 38, no. 4, pp. 329-342, May 2004.
7.
S. Darmoul, S. Elkosantini, A. Louati and L. B. Said, "Multi-agent immune networks to control interrupted flow at signalized intersections", Transp. Res. C Emerg. Technol., vol. 82, pp. 290-313, Sep. 2017.
8.
R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction", IEEE Trans. Neural Netw., vol. 9, no. 5, pp. 1054, Sep. 1998.
9.
C. Szepesvári, "Algorithms for reinforcement learning", Synthesis lectures Artif. Intell. Mach. Learn., vol. 4, no. 1, pp. 1-98, Jul. 2010.
10.
V. Mnih et al., "Human–level control through deep reinforcement learning", Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015.
11.
C. J. C. H. Watkins and P. Dayan, "Q-learning", Mach. Learn., vol. 8, no. 3, pp. 279-292, 1992.
12.
R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning", Reinforcement Learn., vol. 173, pp. 5-32, May 1992.
13.
V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms", Proc. 12th Int. Conf. Neural Inf. Process. Syst., vol. 13, pp. 1008-1014, Nov. 1999.
14.
M. Aslani, M. S. Mesgari and M. Wiering, "Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events", Transp. Res. C Emerg. Technol., vol. 85, pp. 732-752, Dec. 2017.
15.
V. Mnih et al., "Asynchronous methods for deep reinforcement learning", Int. Conf. Mach. Learn., pp. 1928-1937, Jun. 2016.
16.
C. Guestrin, M. Lagoudakis and R. Parr, "Coordinated reinforcement learning", Proc. ICML, pp. 227-234, Jul. 2002.
17.
J. R. Kok and N. Vlassis, "Collaborative multiagent reinforcement learning by payoff propagation", J. Mach. Learn. Res., vol. 7, pp. 1789-1828, Sep. 2006.
18.
M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents", Proc. 10th Int. Conf. Mach. Learn., pp. 330-337, Jun. 1993.
19.
J. Foerster et al., Stabilising experience replay for deep multi-agent reinforcement learning, Feb. 2017, [online] Available: https://arxiv.org/abs/1702.08887.
20.
R. Bellman, "A Markovian decision process", J. Math. Mech., vol. 6, no. 5, pp. 679-684, 1957.
21.
R. S. Sutton, D. McAllester, S. Singh and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation", Proc. Adv. Neural Inf. Process. Syst. (NIPS), pp. 1057-1063, Dec. 1999.
22.
C. Guestrin, D. Koller and R. Parr, "Multiagent planning with factored MDPs", Proc. 14th Int. Conf. Neural Inform. Process. Syst. Natural Synthetic, vol. 1, pp. 1523-1530, Jan. 2001.
23.
G. Tesauro, "Extending Q-learning to general adaptive multi-agent systems", Proc. Adv. Neural Inf. Process. Syst., pp. 871-878, 2004.
24.
M. A. Wiering, J. Van Veenen, J. Vreeken and A. Koopman, "Intelligent traffic light control", 2004.
25.
C. Cai, C. K. Wong and B. G. Heydecker, "Adaptive traffic signal control using approximate dynamic programming", Transp. Res. C Emerg. Technol., vol. 17, no. 5, pp. 456-474, Oct. 2009.
26.
P. La and S. Bhatnagar, "Reinforcement learning with function approximation for traffic signal control", IEEE Trans. Intell. Transp. Syst., vol. 12, no. 2, pp. 412-421, Jun. 2011.
27.
T. Chu and J. Wang, "Traffic signal control with macroscopic fundamental diagrams", Proc. Amer. Control Conf. (ACC), pp. 4380-4385, Jul. 2015.
28.
T. Chu, J. Wang and J. Cao, "Kernel-based reinforcement learning for traffic signal control with adaptive feature selection", Proc. IEEE Conf. Decision Control, pp. 1277-1282, Dec. 2014.
29.
S. Richter, D. Aberdeen and J. Yu, "Natural actor-critic for road traffic optimisation", Proc. Adv. Neural Inf. Process. Syst., pp. 1169-1176, 2007.
30.
T. Chu, S. Qu and J. Wang, "Large-scale multi-agent reinforcement learning using image-based state representation", Proc. IEEE 55th Conf. Decision Control, pp. 7592-7597, Dec. 2016.
Contact IEEE to Subscribe

References

References is not available for this document.