Loading [MathJax]/extensions/MathMenu.js
Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning | IEEE Journals & Magazine | IEEE Xplore

Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning


Abstract:

In this article, a model-free predictive control algorithm for the real-time system is presented. The algorithm is data driven and is able to improve system performance b...Show More

Abstract:

In this article, a model-free predictive control algorithm for the real-time system is presented. The algorithm is data driven and is able to improve system performance based on multistep policy gradient reinforcement learning. By learning from the offline dataset and real-time data, the knowledge of system dynamics is avoided in algorithm design and application. Cooperative games of the multiplayer in time horizon are presented to model the predictive control as optimization problems of multiagent and guarantee the optimality of the predictive control policy. In order to implement the algorithm, neural networks are used to approximate the action–state value function and predictive control policy, respectively. The weights are determined by using the methods of weighted residual. Numerical results show the effectiveness of the proposed algorithm.
Published in: IEEE Transactions on Cybernetics ( Volume: 53, Issue: 5, May 2023)
Page(s): 2818 - 2828
Date of Publication: 09 November 2021

ISSN Information:

PubMed ID: 34752414

Funding Agency:

References is not available for this document.

I. Introduction

Recent years have witnessed thriving research activity on model predictive control (MPC) due to its broad applications in the fields of intelligent vehicles [1], [2]; microgrid operation [3]; industrial processes [4]; and so on. MPC, also called receding horizon control, is a form of model-based optimal control. The control inputs of MPC are obtained by minimizing the given cost functions and solving predictive control sequences from model-based convex programming. This optimization yields optimal/suboptimal predictive control sequences and the first control input of these sequences is applied to systems at each instant. MPC is able to handle control problems that online computation of control inputs is difficult or impossible, such as the capability of controlling multivariable systems. However, it is impossible to solve the predictive control sequence from convex programming directly without the knowledge of system models. Moreover, uncertainties of the system model and design of the cost function may lead to local optima of control sequences. Thus, finding an effective way to address these two difficulties is challenging. There are also some methods proposed to handle these difficulties, that is, Adaptive MPC [5], iterative learning control (ILC) [6], and reinforcement learning (RL) [7].

Select All
1.
J. Jie, A. Khajepour, W. W. Melek and Y. Huang, "Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints", IEEE Trans. Veh. Technol., vol. 66, no. 2, pp. 952-964, Feb. 2017.
2.
Z. P. Wang, G. B. Li, H. J. Jiang, Q. J. Chen and H. Zhang, "Collision-free navigation of autonomous vehicles using convex quadratic programming-based model predictive control", IEEE/ASME Trans. Mechatronics, vol. 23, no. 3, pp. 1103-1113, Jun. 2018.
3.
A. Parisio, E. Rikos and L. Glielmo, "A model predictive control approach to microgrid operation optimization", IEEE Trans. Control Syst. Technol., vol. 22, no. 5, pp. 1813-1827, Sep. 2014.
4.
H. Han and J. Qiao, "Nonlinear model-predictive control for industrial processes: An application to wastewater treatment process", IEEE Trans. Ind. Electron., vol. 61, no. 4, pp. 1970-1982, Apr. 2014.
5.
K. Zhang and Y. Shi, "Adaptive model predictive control for a class of constrained linear systems with parametric uncertainties", Automatica, vol. 117, Jul. 2020.
6.
U. Rosolia and F. Borrelli, "Learning model predictive control for iterative tasks. A data-driven control framework", IEEE Trans. Autom. Control, vol. 63, no. 7, pp. 1883-1896, Jul. 2018.
7.
L. Dong, J. Yan, X. Yuan, H. He and C. Sun, "Functional nonlinear model predictive control based on adaptive dynamic programming", IEEE Trans. Cybern., vol. 49, no. 12, pp. 4206-4218, Dec. 2019.
8.
D. Bristow, M. Tharayil and A. G. Alleyne, "A survey of iterative learning control", IEEE Control Syst., vol. 26, no. 3, pp. 96-114, Jun. 2006.
9.
C. Yang, C. Chen, W. He, R. Cui and Z. Li, "Robot learning system based on adaptive neural control and dynamic movement primitives", IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 3, pp. 777-787, Mar. 2019.
10.
J. H. Lee and K. S. Lee, "Iterative learning control applied to batch processes: An overview", Control Eng. Pract., vol. 15, no. 10, pp. 1306-1318, 2007.
11.
K. S. Lee, I. Chin, H. J. Lee and J. H. Lee, "Model predictive control technique combined with iterative learning for batch processes", AICHE J., vol. 45, no. 10, pp. 2175-2187, 2010.
12.
K. S. Lee and J. H. Lee, "Convergence of constrained model-based predictive control for batch processes", IEEE Trans. Autom. Control, vol. 45, no. 10, pp. 1928-1932, Oct. 2000.
13.
J. H. Lee, K. S. Lee and W. C. Kim, "Model-based iterative learning control with a quadratic criterion for time-varying linear systems", Automatica, vol. 36, no. 5, pp. 641-657, 2000.
14.
R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction", IEEE Trans. Neural Netw., vol. 9, no. 5, pp. 1054, Sep. 1998.
15.
L. P. Kaelbling, M. L. Littman and A. P. Moore, "Reinforcement learning: A survey", J. Artif. Intell. Res., vol. 4, no. 1, pp. 237-285, 1996.
16.
D. P. Bertsekas, J. N. Tsitsiklis and A. Volgenant, "Neuro-dynamic programming", Encyclopedia Optim., vol. 27, no. 6, pp. 1687-1692, 1996.
17.
J. Y. Lee, B. P. Jin and Y. H. Choi, "Integral q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems", Automatica, vol. 48, no. 11, pp. 2850-2859, 2012.
18.
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour and M. B. Naghibi, "Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics", Automatica, vol. 50, no. 4, pp. 1167-1175, 2014.
19.
Y. Jiang and Z. P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics", Automatica, vol. 48, no. 10, pp. 2699-2704, 2012.
20.
J. Skach, B. Kiumarsi, F. L. Lewis and O. Straka, "Actor-critic off-policy learning for optimal control of multiple-model discrete-time systems", IEEE Trans. Cybern., vol. 48, no. 1, pp. 29-40, Jan. 2018.
21.
R. Song, F. L. Lewis, Q. Wei and H. Zhang, "Off-policy actor-critic structure for optimal control of unknown systems with disturbances", IEEE Trans. Cybern., vol. 46, no. 5, pp. 1041-1050, May 2016.
22.
B. Luo, H. N. Wu, T. Huang and D. Liu, "Reinforcement learning solution for HJB equation arising in constrained optimal control problem", Neural Netw., vol. 71, pp. 150-158, Nov. 2015.
23.
D. Liu and Q. Wei, "Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems", IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621-634, Mar. 2014.
24.
X. Zhong, H. He, H. Zhang and Z. Wang, "Optimal control for unknown discrete-time nonlinear Markov jump systems using adaptive dynamic programming", IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 12, pp. 2141-2155, Dec. 2014.
25.
X. Yang, D. Liu, B. Luo and C. Li, "Data-based robust adaptive control for a class of unknown nonlinear constraicned-input systems via integral reinforcement learning", Inf. Sci., vol. 369, pp. 731-747, Nov. 2016.
26.
C. Mu, Z. Ni, C. Sun and H. He, "Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems", IEEE Trans. Cybern., vol. 47, no. 6, pp. 1460-1470, Jun. 2017.
27.
B. Luo, D. Liu, T. Huang and J. Liu, "Output tracking control based on adaptive dynamic programming with multistep policy evaluation", IEEE Trans. Syst. Man Cybern. Syst., vol. 49, no. 10, pp. 2155-2165, Oct. 2019.
28.
K. G. Vamvoudakis, F. L. Lewis and G. R. Hudas, "Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality", Automatica, vol. 48, no. 8, pp. 1598-1611, 2012.
29.
H. Zhang, H. Jiang, C. Luo and G. Xiao, "Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms", IEEE Trans. Cybern., vol. 47, no. 10, pp. 3331-3340, Oct. 2017.
30.
F. L. Lewis and K. H. Movric, "Cooperative optimal control for multi-agent systems on directed graph topologies", IEEE Trans. Autom. Control, vol. 59, no. 3, pp. 769-774, Mar. 2014.
Contact IEEE to Subscribe

References

References is not available for this document.