Abstract:
Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action...Show MoreMetadata
Abstract:
Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcement learning can capture notions of optimal behavior occurring in natural systems. We describe mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming. These give us insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.
Published in: IEEE Circuits and Systems Magazine ( Volume: 9, Issue: 3, Third Quarter 2009)
References is not available for this document.
Select All
1.
M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach", Automatica, vol. 41, no. 5, pp. 779-791, 2005.
2.
M. Abu-Khalaf, F. L. Lewis and J. Huang, "Policy iterations on the Hamilton-Jacobi-Isaacs equation for Η-infinity state feedback control with input saturation", IEEE Trans. Automat. Contr., vol. 51, no. 12, pp. 1989-1995, Dec. 2006.
3.
A. Tamimi, F. L. Lewis and M. Abu-Khalaf, "Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control", Automatica, vol. 43, pp. 473-481, 2007.
4.
Al-Tamimi, F. L. Lewis and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof", IEEE Trans. Syst. Man Cybern. Β, vol. 38, no. 4, pp. 943-949, Aug. 2008.
5.
Anderson, R. M. Kretchner, P. M. Young and D. C. Hittle, "Robust reinforcement learning control with static and dynamic stability", Int. J. Robust Nonlinear Contr., vol. 11, 2001.
6.
L. Baird, "Reinforcement learning in continuous time: Advantage updating", Proc. Int. Conf. Neural Networks, 1994-June.
7.
S. N. Balakrishnan and V. Biega, "Adaptive critic based neural networks for aircraft optimal control", AIAA J. Guid. Contr. Dyn., vol. 19, no. 4, pp. 731-739, 1996.
8.
S. N. Balakrishnan, J. Ding and F. L. Lewis, "Issues on stability of ADP feedback controllers for dynamical systems", IEEE Trans. Syst. Man Cybern. Β, vol. 38, no. 4, pp. 913-917, Aug. 2008.
9.
A. G. Barto, R. S. Sutton and C. Anderson, "Neuron-like adaptive elements that can solve difficult learning control problems", IEEE Trans. Syst. Man Cybern., vol. SMC-13, pp. 834-846, 1983.
10.
A. G. Barto, "Connectionist learning for control" in Neural Networks for Control, MA, Cambridge:MIT Press, 1991.
11.
G. Barto, "Reinforcement learning and adaptive critic methods" in Handbook of Intelligent Control: Neural Fuzzy and Adaptive Approaches, New York:Van Nostrand Reinhold, 1992.
12.
A. G. Barto and T. G. Dietterich, "Reinforcement learning and its relationship to supervised learning" in Handbook of Learning and Approximate Dynamic Programming, New York:Wiley-IEEE Press, 2004.
13.
R. Beard, G. Saridis and J. Wen, "Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation", Automatica, vol. 33, no. 12, pp. 2159-2177, Dec. 1997.
14.
R. E. Bellman, Dynamic Programming, NJ, Princeton:Princeton Univ. Press, 1957.
15.
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, MA:Athena Scientific, 1996.
16.
S. Bradtke, Β. Ydstie and A. Barto, Adaptive linear quadratic control using policy iteration, June 1994.
17.
X. Cao, Stochastic Learning and Optimization, Berlin:Springer-Verlag, 2009.
18.
P. Dayan, "The convergence of TD(λ) for general λ", Mach. Learn., vol. 8, no. 3/4, pp. 341-362, May 1992.
19.
K. Doya, "Reinforcement learning in continuous time and space", Neural Comput., vol. 12, pp. 219-245, 2000.
20.
K. Doya, H. Kimura and M. Kawato, "Neural mechanisms for learning and control", IEEE Control Syst. Mag., pp. 42-54, Aug. 2001.
21.
R. Enns and J. Si, "Apache helicopter stabilization using neural dynamic programming", AIAA J. Guid. Control Dyn., vol. 25, no. 1, pp. 19-25, 2002.
22.
T. Erez and W. D. Smart, "Coupling perception and action using minimax optimal control", Proc. ADPRL, 2009.
23.
P. Farias, "The linear programming approach to approximate dynamic programming" in Handbook of Learning and Approximate Dynamic Programming, New York:Wiley-IEEE Press, Aug. 2004.
24.
L. Feldkamp and D. Prokhorov, "Recurrent neural networks for state estimation", Proc. 12th Yale Workshop Adaptive and Learning Systems, pp. 17-22, 2003.
25.
S. Ferrari and R. Stengel, "An adaptive critic global controller", Proc. American Control Conf., pp. 2665-2670, 2002.
26.
S. Hagen and B. Kröse, "Linear quadratic regulation using reinforcement learning", Proc. 8th Belgian-Dutch Conf. Machine Learning, pp. 39-46, 1998-Oct.
27.
T. Hanselmann, L. Noakes and A. Zaknich, "Continuous-time adaptive critics", IEEE Trans. Neural Networks, vol. 18, no. 3, pp. 631-647, 2007.
28.
P. He and S. Jagannathan, "Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints", IEEE Trans. Syst. Man Cybern. Β, vol. 37, no. 2, pp. 425-436, Apr. 2007.
29.
G. A. Hewer, "An iterative technique for the computation of steady state gains for the discrete optimal regulator", IEEE Trans. Automat. Contr., pp. 382-384, 1971.
30.
K. Hornik, M. Stinchcombe and H. White, "Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks", Neural Netw., vol. 3, pp. 551-560, 1990.