Loading [MathJax]/extensions/MathMenu.js
Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents | IEEE Journals & Magazine | IEEE Xplore

Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents


Abstract:

Reinforcement learning (RL) controllers are flexible and performant but rarely guarantee safety. Safety filters impart hard safety guarantees to RL controllers while main...Show More

Abstract:

Reinforcement learning (RL) controllers are flexible and performant but rarely guarantee safety. Safety filters impart hard safety guarantees to RL controllers while maintaining flexibility. However, safety filters can cause undesired behaviours due to the separation between the controller and the safety filter, often degrading performance and robustness. In this letter, we analyze several modifications to incorporating the safety filter in training RL controllers rather than solely applying it during evaluation. The modifications allow the RL controller to learn to account for the safety filter. This letter presents a comprehensive analysis of training RL with safety filters, featuring simulated and real-world experiments with a Crazyflie 2.0 drone. We examine how various training modifications and hyperparameters impact performance, sample efficiency, safety, and chattering. Our findings serve as a guide for practitioners and researchers focused on safety filters and safe RL.
Published in: IEEE Robotics and Automation Letters ( Volume: 10, Issue: 1, January 2025)
Page(s): 788 - 795
Date of Publication: 05 December 2024

ISSN Information:

No metrics found for this document.

I. Introduction

Robots are increasingly used for safety-critical applications such as autonomous driving [1] and surgery [2]. These tasks, characterized by complex cost functions and (possibly unknown) dynamics, are challenging for classical controllers [3]. This motivates learning-based controllers, especially reinforcement learning (RL) algorithms. Their ability to adapt to complex reward signals and unknown dynamics has led to superior performance in various domains [4]. However, a significant limitation of RL is the lack of safety guarantees [3]. This is undesirable for deployment in safety-critical scenarios despite promising results.

Usage
Select a Year
2025

View as

Total usage sinceDec 2024:383
050100150200JanFebMarAprMayJunJulAugSepOctNovDec1537490000000000
Year Total:317
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.