Loading [MathJax]/extensions/MathMenu.js
Centralized Training with Decentralized Execution Reinforcement Learning for Cooperative Multi-agent Systems with Communication Delay | IEEE Conference Publication | IEEE Xplore

Centralized Training with Decentralized Execution Reinforcement Learning for Cooperative Multi-agent Systems with Communication Delay


Abstract:

In cooperative multi-agent systems, efficient coordination among agents is important when accomplishing tasks. VFFAC is a method that learns the communication system betw...Show More

Abstract:

In cooperative multi-agent systems, efficient coordination among agents is important when accomplishing tasks. VFFAC is a method that learns the communication system between agents and their interactions with the environment to obtain policies with high performance. However, this method results in decreased performance of policy in environments with a delay in communication. Furthermore, there is no formulation of the control problem of a cooperative multi-agent system with communication delays in unknown environments. In this study, we formulated a decision-making problem in a cooperative multi-agent system with an unknown environment model and a certain length delay in communication. We also propose a method to handle communication delays by using the history of information obtained through communication. We demonstrated that the proposed method successfully learns policy with high rewards through simulated experiments in an environment with a communication delay.
Date of Conference: 06-09 September 2022
Date Added to IEEE Xplore: 06 October 2022
ISBN Information:
Conference Location: Kumamoto, Japan
References is not available for this document.

1. INTRODUCTION

Reinforcement learning (RL) [1] is a framework of machine learning in which an agent learns effective policies to accomplish a task by trial and error in an unknown environment. The agent obtains rewards through trial and error against the environment, and aims to identify a policy that maximizes the discounted cumulative reward.

Select All
1.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, The MIT Press, 2018.
2.
L. Busoniu, R. Babuska and B. De Schutter, "A Comprehensive Survey of Multiagent Reinforcement Learning", IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews), vol. 38, no. 2, pp. 156-172, 2008.
3.
M. Hüttenrauch, A. Šošić and G. Neumann, "Guided Deep Reinforcement Learning for Swarm Systems", AAMAS 2017 Autonomous Robots and Multirobot Systems (ARMS) Workshop, 2017.
4.
S. Shalev-Shwartz, S. Shammah and A. Shashua, "Safe Multi-Agent Reinforcement Learning for Autonomous Driving", 2016.
5.
S. Grigorescu, B. Trasnea, T. Cocias and G. Macesanu, "A survey of deep learning techniques for autonomous driving", Journal of Field Robotics, vol. 37, no. 3, pp. 362-386, 2020.
6.
J. Foerster, I. A. Assael, N. de Freitas and S. Whiteson, "Learning to Communicate with Deep Multi-Agent Reinforcement Learning" in Advances in Neural Information Processing Systems, Curran Associates, Inc, vol. 29, 2016.
7.
T. Wang, J. Wang, C. Zheng and C. Zhang, "Learning Nearly Decomposable Value Functions Via Communication Minimization", 2020.
8.
B. Wu, X. Yang, C. Sun, R. Wang, X. Hu and Y. Hu, "Learning Effective Value Function Factorization via Attentional Communication", 2020 IEEE International Conference on Systems Man and Cybernetics (SMC), pp. 629-634, 2020.
9.
F. A. Oliehoek, M. T. Spaan and N. Vlassis, "Dec-POMDPs with delayed communication", The 2nd Workshop on Multi-agent Sequential Decision-Making in Uncertain Domains, 2007.
10.
Y. Cao, W. Yu, W. Ren and G. Chen, "An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination", IEEE Transactions on Industrial Informatics, vol. 9, no. 1, pp. 427-438, 2013.
11.
F. A. Oliehoek, "Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments", PhD thesis, 2010.
12.
F. A. Oliehoek and C. Amato, A Concise Introduction to Decentralized POMDPs, Springer International Publishing, 2016.
13.
J. Foerster, G. Farquhar, T. Afouras, N. Nardelli and S. Whiteson, "Counterfactual Multi-Agent Policy Gradients", Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
14.
T. Rashid, M. Samvelyan, C. Schroeder, G. Far-quhar, J. Foerster and S. Whiteson, "Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning", International Conference on Machine Learning, pp. 4295-4304, 2018.
15.
K. Cho, B. van Merrienboer, C. Gulcehre, D. Bah-danau, F. Bougares, H. Schwenk, et al., "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation", 2014.
16.
D. Ha, A. Dai and Q. V. Le, "HyperNetworks", 2016.
Contact IEEE to Subscribe

References

References is not available for this document.