Journals & Magazines >IEEE Transactions on Cybernet... >Volume: 46 Issue: 5

Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is...Show More

Metadata

Abstract:

An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.

Published in: IEEE Transactions on Cybernetics ( Volume: 46, Issue: 5, May 2016)

Page(s): 1041 - 1050

Date of Publication: 28 April 2015

ISSN Information:

PubMed ID: 25935054

DOI: 10.1109/TCYB.2015.2421338

Funding Agency:

Contents

I. Introduction

Optimal control is generally an offline design technique that requires full knowledge of the system dynamics [1], e.g., in the linear system case, one must solve the Riccati equation. In the nonlinear system case, approximation methods are used to solve Hamilton–Jacobi–Bellman (HJB) equations [2]–[4]. Adaptive/approximate dynamic programming (ADP) is one of the most useful intelligent approximate control methods for solving nonlinear HJB equations [5]–[10]. Such as in [11], a novel numerically adaptive learning control scheme based on ADP was developed to solve numerically HJB equation, which is the first result of applying numerical ADP to solve optimal control problems of nonlinear systems. In [12], a finite horizon iterative ADP algorithm was developed to solve the optimal solution of the HJB equation for a class of discrete-time nonlinear systems with an unfixed initial state. That is the first result about obtaining an epsilon-optimal control law for nonlinear systems in finite time with an arbitrary initial state in the initial state set. However, due to the large scale and complex manufacturing techniques, many industrial dynamics are difficult to be estimated and cannot be accurately obtained [13]–[16]. Therefore, optimal adaptive controllers have been designed using indirect techniques, whereby the unknown plant is first identified and then HJB equation is solved [17]–[19].

References is not available for this document.

Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References