Loading [MathJax]/extensions/MathZoom.js
Deep Reinforcement Learning for Autonomous Driving: A Survey | IEEE Journals & Magazine | IEEE Xplore

Deep Reinforcement Learning for Autonomous Driving: A Survey


Abstract:

With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex p...Show More

Abstract:

With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 23, Issue: 6, June 2022)
Page(s): 4909 - 4926
Date of Publication: 09 February 2021

ISSN Information:


I. Introduction

Autonomous driving (AD) systems constitute of multiple perception level tasks that have now achieved high precision on account of deep learning architectures. Besides the perception, autonomous driving systems constitute of multiple tasks where classical supervised learning methods are no more applicable. First, when the prediction of the agent’s action changes future sensor observations received from the environment under which the autonomous driving agent operates, for example the task of optimal driving speed in an urban area. Second, supervisory signals such as time to collision (TTC), lateral error w.r.t to optimal trajectory of the agent, represent the dynamics of the agent, as well uncertainty in the environment. Such problems would require defining the stochastic cost function to be maximized. Third, the agent is required to learn new configurations of the environment, as well as to predict an optimal decision at each instant while driving in its environment. This represents a high dimensional space given the number of unique configurations under which the agent & environment are observed, this is combinatorially large. In all such scenarios we are aiming to solve a sequential decision process, which is formalized under the classical settings of Reinforcement Learning (RL), where the agent is required to learn and represent its environment as well as act optimally given at each instant [1]. The optimal action is referred to as the policy.

For easy reference, the main acronyms used in this article are listed in Appendix (Table IV).

Contact IEEE to Subscribe

References

References is not available for this document.