Task-specific pre-learning to improve the convergence of reinforcement learning based on a deep neural network | IEEE Conference Publication | IEEE Xplore

Task-specific pre-learning to improve the convergence of reinforcement learning based on a deep neural network


Abstract:

The poor convergence of reinforcement learning based on neural networks is of great challenging for autonomous robots. Inspiring from human reinforcement learning, we pro...Show More

Abstract:

The poor convergence of reinforcement learning based on neural networks is of great challenging for autonomous robots. Inspiring from human reinforcement learning, we propose a two-phase reinforcement learning model based on deep neural networks to improve the convergence problem. In phase 1, task-specific pre-learning based on supervised learning is employed to train a nonlinear neural network in order to approximate a strong reward function. In phase 2, the learned neural network is modified and expanded to a deep neural network to realize reinforcement learning. We test this two-phase approach on the famous mountain-car problem and an autonomous movement controller of a wheeled robot by simulation. The experimental results show that task-specific pre-learning can significantly improve the convergence of traditional reinforcement learning based on neural networks.
Date of Conference: 12-15 June 2016
Date Added to IEEE Xplore: 29 September 2016
ISBN Information:
Conference Location: Guilin, China
Citations are not available for this document.

I. Introduction

Reinforcement learning (RL) [1] is one of most important learning methods for constructing robots' controllers, because in many tasks, whether simple or complex ones, human engineers have only some scattered pieces of knowledge of how to explicitly define an optimal policy, which do not allow them to design a fixed-programmed controller and reinforcement learning can help to stick these pieces of knowledge together by representing them as two elementary concepts, Markov Decision Process (MDP) and reward function. Original reinforcement learning takes the process of task execution as a MDP and iteratively approximates the optimal policy defined by the Bellman optimality equation which is composed of a reward function. Reinforcement learning includes a large range of algorithms, such as SARSA [2], Q learning [3] and their variants. In the early phase, these algorithms usually use a table to record the evaluating values of every decision of the states during a MDP. These table based algorithms greatly limit RL's application in the tasks which have a large discrete state set or continuous state space. Therefore, neural network approximator is introduced into reinforcement learning to generalize from the table. Many successful attempts have been made in fields like robot, computer games and unmanned vehicle.

Cites in Papers - |

Cites in Papers - IEEE (1)

Select All
1.
Dazi Li, Zhudan Chen, Xin Ma, Qibing Jin, "Feature Extraction for Controller Design by Deep Auto-Encoder Neural Network and Least squares Policy Iteration", 2019 IEEE 8th Data Driven Control and Learning Systems Conference (DDCLS), pp.1-6, 2019.
Contact IEEE to Subscribe

References

References is not available for this document.