Loading [MathJax]/extensions/MathMenu.js
Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning | IEEE Journals & Magazine | IEEE Xplore

Data-Based Predictive Control via Multistep Policy Gradient Reinforcement Learning


Abstract:

In this article, a model-free predictive control algorithm for the real-time system is presented. The algorithm is data driven and is able to improve system performance b...Show More

Abstract:

In this article, a model-free predictive control algorithm for the real-time system is presented. The algorithm is data driven and is able to improve system performance based on multistep policy gradient reinforcement learning. By learning from the offline dataset and real-time data, the knowledge of system dynamics is avoided in algorithm design and application. Cooperative games of the multiplayer in time horizon are presented to model the predictive control as optimization problems of multiagent and guarantee the optimality of the predictive control policy. In order to implement the algorithm, neural networks are used to approximate the action–state value function and predictive control policy, respectively. The weights are determined by using the methods of weighted residual. Numerical results show the effectiveness of the proposed algorithm.
Published in: IEEE Transactions on Cybernetics ( Volume: 53, Issue: 5, May 2023)
Page(s): 2818 - 2828
Date of Publication: 09 November 2021

ISSN Information:

PubMed ID: 34752414

Funding Agency:


I. Introduction

Recent years have witnessed thriving research activity on model predictive control (MPC) due to its broad applications in the fields of intelligent vehicles [1], [2]; microgrid operation [3]; industrial processes [4]; and so on. MPC, also called receding horizon control, is a form of model-based optimal control. The control inputs of MPC are obtained by minimizing the given cost functions and solving predictive control sequences from model-based convex programming. This optimization yields optimal/suboptimal predictive control sequences and the first control input of these sequences is applied to systems at each instant. MPC is able to handle control problems that online computation of control inputs is difficult or impossible, such as the capability of controlling multivariable systems. However, it is impossible to solve the predictive control sequence from convex programming directly without the knowledge of system models. Moreover, uncertainties of the system model and design of the cost function may lead to local optima of control sequences. Thus, finding an effective way to address these two difficulties is challenging. There are also some methods proposed to handle these difficulties, that is, Adaptive MPC [5], iterative learning control (ILC) [6], and reinforcement learning (RL) [7].

Contact IEEE to Subscribe

References

References is not available for this document.