Conferences >2007 IEEE International Sympo...

Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely heuristic dynamic programming (HDP), is used to solve for the value functi...Show More

Metadata

Abstract:

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used - one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

Published in: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

Date of Conference: 01-05 April 2007

Date Added to IEEE Xplore: 04 June 2007

Print ISBN:1-4244-0706-0

ISSN Information:

DOI: 10.1109/ADPRL.2007.368167

Conference Location: Honolulu, HI, USA

Contents

I. Introduction

This paper is concerned with the application of approximate dynamic programming techniques (ADP) to find the value function of the DT HJB that appears in optimal control problems. ADP is an approach to solve dynamical programming problems utilizing function approximation. ADP was proposed by Werbos [12], Barto et. al. [7], Widrow et. al. [21], Howard [13], Watkins [10], Bertsekas and Tsitsiklis [17], and others as a way to solve optimal control problems forward-in-time. Therefore ADP combines adaptive critics, a reinforcement learning technique, with dynamic programming.

References is not available for this document.

Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References