Solving Challenging Control Problems via Learning-based Motion Planning and Imitation | IEEE Conference Publication | IEEE Xplore

Solving Challenging Control Problems via Learning-based Motion Planning and Imitation


Abstract:

We present a deep reinforcement learning (deep RL) algorithm that consists of learning-based motion planning and imitation to tackle challenging control problems. Deep RL...Show More

Abstract:

We present a deep reinforcement learning (deep RL) algorithm that consists of learning-based motion planning and imitation to tackle challenging control problems. Deep RL has been an effective tool for solving many high-dimensional continuous control problems, but it cannot effectively solve challenging problems with certain properties, such as sparse reward functions or sensitive dynamics. In this work, we propose an approach that decomposes the given problem into two deep RL stages: motion planning and motion imitation. The motion planning stage seeks to compute a feasible motion plan by leveraging the powerful planning capability of deep RL. Subsequently, the motion imitation stage learns a control policy that can imitate the given motion plan with realistic sensors and actuation models. This new formulation requires only a nominal added cost to the user because both stages require minimal changes to the original problem. We demonstrate that our approach can solve challenging control problems, rocket navigation, and quadrupedal locomotion, which cannot be solved by the monolithic deep RL formulation or the version with Probabilistic Roadmap.
Date of Conference: 25-28 June 2023
Date Added to IEEE Xplore: 08 August 2023
ISBN Information:
Conference Location: Honolulu, HI, USA
References is not available for this document.

I. Introduction

Deep reinforcement learning (deep RL) has demonstrated impressive performance on a wide range of sequential decision problems ranging from the Go game [1], [2] to robot control [3]–[5]. However, deep RL in practice still requires significant engineering efforts on tuning algorithms and reward functions to find an effective control policy. Conditions such as sparse reward functions and sensitive dynamics amplify the complexity of the problem. These properties degrade the accuracy of the policy gradient method by obscuring informative signals and causing inaccurate gradient estimation. Engineers often try to mitigate these issues by designing denser and smoother rewards or incorporating hand-designed controllers based on their prior knowledge.

Getting results...

Contact IEEE to Subscribe

References

References is not available for this document.