Journals & Magazines >IEEE Transactions on Industri... >Volume: 20 Issue: 9

Safe Multiagent Learning With Soft Constrained Policy Optimization in Real Robot Control

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Due to a lack of safety considerations, a wide range of multiagent reinforcement learning (MARL) applications are limited in real-world environments. Thus, ensuring MARL ...Show More

Metadata

Abstract:

Due to a lack of safety considerations, a wide range of multiagent reinforcement learning (MARL) applications are limited in real-world environments. Thus, ensuring MARL safety is essential and urgent in the domain. However, merely a few studies consider the safe MARL problem, and the investigation of real-world applications using safe MARL algorithms still needs to be improved. To fill this gap, we provide a framework with soft constrained policy optimization, in which we develop practical algorithms to address the problem in a cooperative game setting. First, the problem formulation of safe MARL is introduced. Second, the safe policy optimization of safe MARL algorithms based on soft constrained optimization is analyzed, and we further propose a safe learning framework for safe MARL. The framework can be plugged into MARL algorithms without manually fine-tuning safety bounds. Third, we investigate the sim-to-real problems, and conduct simulation and real-world experiments to evaluate the effectiveness of our algorithms. Finally, the comprehensive experimental results indicate that our method has significant benefits regarding the balance between reward and safety performance and outperforms several strong baselines.

Published in: IEEE Transactions on Industrial Informatics ( Volume: 20, Issue: 9, September 2024)

Page(s): 10706 - 10716

Date of Publication: 15 May 2024

ISSN Information:

DOI: 10.1109/TII.2024.3391934

Funding Agency:

Contents

I. Introduction

Reinforcement learning (RL) has shown remarkable success in many areas [1], [2], [3], [4], safety is a central concern when deploying RL algorithms in real-world applications [2], [5]. For example, a real-world robot should avoid crashing into unsafe areas and hurting humans when the robot explores the environment; a piece of false information should not be recommended to customers in a recommender system. Although safe RL has received substantial attention in recent years [6], and massive safe RL methods are proposed to helpfully ensure RL safety [7], the safety in multiagent RL (MARL) domains still remains open [8]. Due to the instability of multiagent systems, the problem of safe MARL is more complicated and challenging than the problem of safe single-agent RL [2]. In safe MARL settings, the ego agent's reward and safety are optimized while considering other agents' safety and reward. Currently, few safe MARL methods are proposed, and the stability and convergence analysis still lack, e.g., MACPO [9], MAPPO-L [9], CMIX [10], and safe MARL via shielding [11]. More importantly, most safe RL methods leveraging hard constrained policy optimization need to fine-tune the safety bounds manually, e.g., CPO [12], FOCOPS [13], PCPO [14], MACPO [9], and MAPPO-L [9]. Such that, more research burden is required to fine-tune safety bounds, and until now, there is no principle to guide the setting of safety bounds, only by empirical experiments. Moreover, these algorithms' policy update needs to satisfy the safety bounds at each iteration, which could result in oscillation during policy exploration, and it can be detrimental to the reward and safety performance. In this study, we propose a safe learning framework for safe MARL with soft constrained policy optimization, which can be a plug-and-play module for existing RL algorithms without fine-tuning safety bounds. Derived from the framework, two algorithms are developed, which are the safe multiagent trust region policy optimization algorithm (SM-TRPO) and the safe multiagent policy proximal optimization (SM-PPO), respectively. The experimental results demonstrate that our algorithms without fine-tuning safety bounds can achieve more comparable performance than the state-of-the-art baselines that do not consider safety; meanwhile, our algorithms can ensure agent safety.

References is not available for this document.

MIT Libraries

MIT Libraries

Safe Multiagent Learning With Soft Constrained Policy Optimization in Real Robot Control

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Description

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Safe Multiagent Learning With Soft Constrained Policy Optimization in Real Robot Control

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Description

I. Introduction

References