Loading [a11y]/accessibility-menu.js
Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof | IEEE Conference Publication | IEEE Xplore

Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof


Abstract:

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely heuristic dynamic programming (HDP), is used to solve for the value functi...Show More

Abstract:

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used - one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the algebraic Riccati equation (ARE). The second example considers a nonlinear control system.
Date of Conference: 01-05 April 2007
Date Added to IEEE Xplore: 04 June 2007
Print ISBN:1-4244-0706-0

ISSN Information:

Conference Location: Honolulu, HI, USA
References is not available for this document.

I. Introduction

This paper is concerned with the application of approximate dynamic programming techniques (ADP) to find the value function of the DT HJB that appears in optimal control problems. ADP is an approach to solve dynamical programming problems utilizing function approximation. ADP was proposed by Werbos [12], Barto et. al. [7], Widrow et. al. [21], Howard [13], Watkins [10], Bertsekas and Tsitsiklis [17], and others as a way to solve optimal control problems forward-in-time. Therefore ADP combines adaptive critics, a reinforcement learning technique, with dynamic programming.

Select All
1.
S. J. Bradtke, B. E. Ydestie, A. G. Barto, "Adaptive linear quadratic control using policy iteration," Proceedings of the American Control Conference , pp. 3475-3476, Baltmore, Myrland, June, 1994.
2.
A. Al-Tamimi, M. Abu-Khalaf, F. L. Lewis, "Adaptive Critic Designs for Discrete-Time Zero-Sum Games with Application to H-Infinity Control" IEEE Transactions on Systems, Man, Cybernetics-Part B, November 2006.
3.
A. Al-Tamimi, M. Abu-Khalaf, F. L. Lewis, "Model-Free Q-Learning Designs for Discrete-Time Zero-Sum Games with Application to H-Infinity Control," to appear, Automatica.
4.
J. Si, A. Barto, W. Powell, D. Wunsch, Handbook of Learning and Approximate Dynamic Programming, John Wiley, New Jersey, 2004.
5.
S. Hagen, B Krose, " Linear quadratic Regulation using Reinforcement Learning," Belgian_Dutch Conference on Mechanical Learning, pp. 39-46, 1998.
6.
D. Kleinman, " Stabilizing a discrete, Constant, Linear System with Application to iterative Methods for Solving the Riccati Equation," IEEE Trans. Automat. Control, pp. 252-254, June 1974.
7.
A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike elements that can solve difficult learning control problems," IEEE Trans. Syst., Man. Cybern., vol. SMC-13, pp. 835-846, 1983.
8.
T. Landelius, Reinforcement Learning and Distributed Local Model Synthesis, PhD Dissertation, Linkoping University, Sweden, 1997.
9.
F. L. Lewis, V. L. Syrmos, Optimal Control, John Wiley, 1995.
10.
C. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, Cambridge, England, 1989.
11.
P.J. Werbos, "Neural networks for control and system identification," Heuristics, Vol. 3, No. 1, Spring 1990, pp. 18-27,. 1990.
12.
P.J, Werbos., "A menu of designs for reinforcement learning over time," , Neural Networks for Control, pp. 67-95, ed. W.T. Miller, R.S. Sutton, P.J. Werbos, Cambridge: MIT Press, 1991.
13.
R. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960.
14.
P.J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," Handbook of Intelligent Control, ed. D.A. White and D.A. Sofge, New York: Van Nostrand Reinhold, 1992.
15.
W. Lin and C. I. Byrnes, "H Control of Discrete-Time Nonlinear System,"IEEE Trans. on Automat. Control, vol 41, No 4, pp 494-510, 1996. 1994.
16.
D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Trans. on Neural Networks, vol. 8, no. 5, pp 997-1007, 1997.
17.
D.P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, MA, 1996.
18.
K.S. Narendra and F.L. Lewis,"Special Issue on Neural Network feedback Control," Automatica, vol. 37, no. 8, Aug. 2001.
19.
M. Abu-Khalaf, F. L. Lewis, and J. Huang, "Hamilton-JacobiIsaacs formulation for constrained input nonlinear systems," in 43rd IEEE Conference on Decision and Control, 2004, pp. 5034 -5040 Vol.5, Bahamas, 2004.
20.
M. Abu-Khalaf, F. L. Lewis, "Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach," Automatica, vol. 41, pp. 779 - 791, 2005.
21.
B. Widrow, N. Gupta, and S. Maitra, "Punish/reward: Learning with a critic in adaptive threshold systems," IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pp. 455-465, 1973.
22.
W. H Kwon and S. Han, Receding Horizon Control, Springer-Verlag, London, 2005.
23.
B. Stevens, F. L. Lewis, Aircraft Control and Simulation, 2 edition, John Wiley, New Jersey, 2003.
24.
F.L. Lewis, Optimal Estimation. John Wiley, New York, 1986.
25.
F. L. Lewis, Applied Optimal Control and Estimation, Prentice-Hall, New Jersey, 1992.
26.
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive Dynamic Programming," IEEE Trans, on Sys., Man. and Cyb., Vol. 32, No. 2, pp 140-153, 2002.
27.
J. Si and Wang, "On-Line learning by association andreinforcement, IEEE Trans. Neural Networks, vol. 12, pp.264-276, Mar. 2001
28.
Xi-Ren Cao, "Learning and Optimization-From a SystemsTheoretic Perspective", Proc. of IEEE Conference on Decision and Control, pp. 3367-3371, 2002
29.
P. He and S. Jagannathan, "Reinforcement learning-basedoutput feedback control of nonlinear systems with input constraints," IEEE Trans. Systems, Man, and Cybernetics -Part B:Cybernetics, vol. 35, no. 1, pp. 150-154, Feb. 2005
30.
Chen, Z., Jagannathan, S.," Neural Network -based Nearly Optimal Hamilton-Jacobi-Bellman Solution for Affine Nonlinear Discrete-Time Systems," IEEE CDC 05 ,pp 4123-4128, Dec 2005
Contact IEEE to Subscribe

References

References is not available for this document.