Conferences >2016 12th World Congress on I...

Task-specific pre-learning to improve the convergence of reinforcement learning based on a deep neural network

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The poor convergence of reinforcement learning based on neural networks is of great challenging for autonomous robots. Inspiring from human reinforcement learning, we pro...Show More

Metadata

Abstract:

The poor convergence of reinforcement learning based on neural networks is of great challenging for autonomous robots. Inspiring from human reinforcement learning, we propose a two-phase reinforcement learning model based on deep neural networks to improve the convergence problem. In phase 1, task-specific pre-learning based on supervised learning is employed to train a nonlinear neural network in order to approximate a strong reward function. In phase 2, the learned neural network is modified and expanded to a deep neural network to realize reinforcement learning. We test this two-phase approach on the famous mountain-car problem and an autonomous movement controller of a wheeled robot by simulation. The experimental results show that task-specific pre-learning can significantly improve the convergence of traditional reinforcement learning based on neural networks.

Published in: 2016 12th World Congress on Intelligent Control and Automation (WCICA)

Date of Conference: 12-15 June 2016

Date Added to IEEE Xplore: 29 September 2016

ISBN Information:

DOI: 10.1109/WCICA.2016.7578787

Conference Location: Guilin, China

Contents

I. Introduction

Reinforcement learning (RL) [1] is one of most important learning methods for constructing robots' controllers, because in many tasks, whether simple or complex ones, human engineers have only some scattered pieces of knowledge of how to explicitly define an optimal policy, which do not allow them to design a fixed-programmed controller and reinforcement learning can help to stick these pieces of knowledge together by representing them as two elementary concepts, Markov Decision Process (MDP) and reward function. Original reinforcement learning takes the process of task execution as a MDP and iteratively approximates the optimal policy defined by the Bellman optimality equation which is composed of a reward function. Reinforcement learning includes a large range of algorithms, such as SARSA [2], Q learning [3] and their variants. In the early phase, these algorithms usually use a table to record the evaluating values of every decision of the states during a MDP. These table based algorithms greatly limit RL's application in the tasks which have a large discrete state set or continuous state space. Therefore, neural network approximator is introduced into reinforcement learning to generalize from the table. Many successful attempts have been made in fields like robot, computer games and unmanned vehicle.

References is not available for this document.

Task-specific pre-learning to improve the convergence of reinforcement learning based on a deep neural network

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Task-specific pre-learning to improve the convergence of reinforcement learning based on a deep neural network

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References