Loading [MathJax]/extensions/MathZoom.js
Solving Robotic Manipulation With Sparse Reward Reinforcement Learning Via Graph-Based Diversity and Proximity | IEEE Journals & Magazine | IEEE Xplore

Solving Robotic Manipulation With Sparse Reward Reinforcement Learning Via Graph-Based Diversity and Proximity


Abstract:

In multigoal reinforcement learning (RL), algorithms usually suffer from inefficiency in the collection of successful experiences in tasks with sparse rewards. By utilizi...Show More

Abstract:

In multigoal reinforcement learning (RL), algorithms usually suffer from inefficiency in the collection of successful experiences in tasks with sparse rewards. By utilizing the ideas of relabeling hindsight experience and curriculum learning, some prior works have greatly improved the sample efficiency in robotic manipulation tasks, such as hindsight experience replay (HER), hindsight goal generation (HGG), graph-based HGG (G-HGG), and curriculum-guided HER (CHER). However, none of these can learn efficiently to solve challenging manipulation tasks with distant goals and obstacles, since they rely either on heuristic or simple distance-guided exploration. In this article, we introduce graph-curriculum-guided HGG (GC-HGG), an extension of CHER and G-HGG, which works by selecting hindsight goals on the basis of graph-based proximity and diversity. We evaluated GC-HGG in four challenging manipulation tasks involving obstacles in both simulations and real-world experiments, in which significant enhancements in both sample efficiency and overall success rates over prior works were demonstrated. Videos and codes can be viewed at this link: https://videoviewsite.wixsite.com/gc-hgg.
Published in: IEEE Transactions on Industrial Electronics ( Volume: 70, Issue: 3, March 2023)
Page(s): 2759 - 2769
Date of Publication: 11 May 2022

ISSN Information:

Funding Agency:


I. Introduction

Deep reinforcement learning has successfully revolutionized the process of solving decision-making problems in many areas ranging from robotics, for example, solving a Rubik’s cube or enabling autonomous driving, to mind games such as AlphaGo, Atari games, and Starcraft [1]–[6]. A recurrent problem of RL, however, is that it requires handcrafted reward functions that are tailored to individual tasks, which usually feature complex and as yet unknown behaviors in most real-world applications. Therefore, the design of a proper reward is challenging and a major impediment to the widespread adoption of RL for use in real-world applications.

Contact IEEE to Subscribe

References

References is not available for this document.