Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp | IEEE Conference Publication | IEEE Xplore

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp


Abstract:

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it’s still challenging to apply it to real-world tasks, due...Show More

Abstract:

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it’s still challenging to apply it to real-world tasks, due to the poor sample efficiency. Attempting to overcome this shortcoming, several works focus on reusing the collected trajectory data during the training by decomposing them into a set of policy-irrelevant discrete transitions. However, their improvements are somewhat marginal since i) the amount of the transitions is usually small, and ii) the value assignment only happens in the joint states. To address these issues, this paper introduces a concise yet powerful method to construct Continuous Transition, which exploits the trajectory information by exploiting the potential transitions along the trajectory. Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically. Extensive experiments demonstrate that our proposed method achieves a significant improvement in sample efficiency on various complex continuous robotic control problems in MuJoCo and outperforms the advanced model-based / model-free RL methods. The source code is available1.
Date of Conference: 30 May 2021 - 05 June 2021
Date Added to IEEE Xplore: 18 October 2021
ISBN Information:

ISSN Information:

Conference Location: Xi'an, China

Funding Agency:

Citations are not available for this document.

I. INTRODUCTION

Deep reinforcement learning (RL), with high-capacity deep neural networks, has been applied to solve various complex decision-making tasks, including chess games [1], [2], video games [3], [4], etc. In robotics, the ultimate goal of RL is to endow robots with the ability to learn, improve, adapt, and reproduce tasks (e.g., robotic manipulation [5], [6], [7], robot navigation [8], [9], robot competition [10] and other robotic control tasks [11], [12], [13], [14]). However, as a matter of fact, the RL applications in the context of robotics suffer from the poor sample efficiency of RL [15]. For instance, even when solving a simple task, RL still needs substantial interaction data for policy improvement. Furthermore, the poor sample efficiency not only slows down the policy improvement but also brings about other deleterious problems for deep RL, such as memorization and sensitivity to out-of-manifold samples [16], [17]. Generally, RL agent gathers data for policy improvement along the learning process, which means that at the early stage of learning, the amount of training data is small, and the deep neural network is prone to memorize (instead of generalizing from) the training data [18]. Unfortunately, such memorization causes the bootstrapping value function to fail to generalize well to the unvisited (out-of-manifold) state-action combinations and thus can hamper the policy improvement via Bellman backup [19]. Moreover, it also degrades the performance of the agent in the environment, due to its exploring preference towards unvisited states.

Cites in Papers - |

Cites in Papers - IEEE (2)

Select All
1.
Jinghui Qin, Zhongzhan Huang, Ying Zeng, Quanshi Zhang, Liang Lin, "An Introspective Data Augmentation Method for Training Math Word Problem Solvers", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.32, pp.3113-3127, 2024.
2.
Nico Messikommer, Yunlong Song, Davide Scaramuzza, "Contrastive Initial State Buffer for Reinforcement Learning", 2024 IEEE International Conference on Robotics and Automation (ICRA), pp.2866-2872, 2024.

Cites in Papers - Other Publishers (3)

1.
Mohammad Saber Iraji, Jafar Tanha, Mohammad-Ali Balafar, Mohammad-Reza Feizi-Derakhshi, "A novel interpolation consistency for bad generative adversarial networks (IC-BGAN)", Multimedia Tools and Applications, 2024.
2.
Mohammad Saber Iraji, jafar tanha, Mohammad Ali Balafar, Mohammad-Reza Feizi-Derakhshi, , 2023.
3.
Pu Feng, Xin Yu, Junkang Liang, Wenjun Wu, Yongkai Tian, "MACT: Multi-agent Collision Avoidance with Continuous Transition Reinforcement Learning via Mixup", Advances in Swarm Intelligence, vol.13969, pp.74, 2023.
Contact IEEE to Subscribe

References

References is not available for this document.