Conferences >2021 IEEE International Confe...

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it’s still challenging to apply it to real-world tasks, due...Show More

Metadata

Abstract:

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it’s still challenging to apply it to real-world tasks, due to the poor sample efficiency. Attempting to overcome this shortcoming, several works focus on reusing the collected trajectory data during the training by decomposing them into a set of policy-irrelevant discrete transitions. However, their improvements are somewhat marginal since i) the amount of the transitions is usually small, and ii) the value assignment only happens in the joint states. To address these issues, this paper introduces a concise yet powerful method to construct Continuous Transition, which exploits the trajectory information by exploiting the potential transitions along the trajectory. Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically. Extensive experiments demonstrate that our proposed method achieves a significant improvement in sample efficiency on various complex continuous robotic control problems in MuJoCo and outperforms the advanced model-based / model-free RL methods. The source code is available¹.

Published in: 2021 IEEE International Conference on Robotics and Automation (ICRA)

Date of Conference: 30 May 2021 - 05 June 2021

Date Added to IEEE Xplore: 18 October 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/ICRA48506.2021.9561137

Conference Location: Xi'an, China

Funding Agency:

Contents

I. INTRODUCTION

Deep reinforcement learning (RL), with high-capacity deep neural networks, has been applied to solve various complex decision-making tasks, including chess games [1], [2], video games [3], [4], etc. In robotics, the ultimate goal of RL is to endow robots with the ability to learn, improve, adapt, and reproduce tasks (e.g., robotic manipulation [5], [6], [7], robot navigation [8], [9], robot competition [10] and other robotic control tasks [11], [12], [13], [14]). However, as a matter of fact, the RL applications in the context of robotics suffer from the poor sample efficiency of RL [15]. For instance, even when solving a simple task, RL still needs substantial interaction data for policy improvement. Furthermore, the poor sample efficiency not only slows down the policy improvement but also brings about other deleterious problems for deep RL, such as memorization and sensitivity to out-of-manifold samples [16], [17]. Generally, RL agent gathers data for policy improvement along the learning process, which means that at the early stage of learning, the amount of training data is small, and the deep neural network is prone to memorize (instead of generalizing from) the training data [18]. Unfortunately, such memorization causes the bootstrapping value function to fail to generalize well to the unvisited (out-of-manifold) state-action combinations and thus can hamper the policy improvement via Bellman backup [19]. Moreover, it also degrades the performance of the agent in the environment, due to its exploring preference towards unvisited states.

References is not available for this document.

MIT Libraries

MIT Libraries

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. INTRODUCTION

References