Intelligent-Critic-Based Tracking Control of Discrete-Time Input-Affine Systems and Approximation Error Analysis With Application Verification | IEEE Journals & Magazine | IEEE Xplore

Intelligent-Critic-Based Tracking Control of Discrete-Time Input-Affine Systems and Approximation Error Analysis With Application Verification


Abstract:

In recent years, the application of function approximators, such as neural networks and polynomials, has ushered in a new stage of development in solving optimal control ...Show More

Abstract:

In recent years, the application of function approximators, such as neural networks and polynomials, has ushered in a new stage of development in solving optimal control problems. However, considering the existence of approximation errors, the stability of the controlled system cannot be guaranteed. Therefore, in view of the prevalence of approximation errors, we investigate optimal tracking control problems for discrete-time systems. First, a novel value function is introduced into the intelligent critic framework. Second, an implicit method is utilized to demonstrate the boundedness of the iterative value functions with approximation errors. An explicit method is applied to prove the stability of the system with approximation errors. Furthermore, an evolving policy is designed to iteratively tackle the optimal tracking control problem and demonstrate the stability of the system. Finally, the effectiveness of the developed method is verified through numerical as well as practical examples.
Published in: IEEE Transactions on Cybernetics ( Volume: 54, Issue: 8, August 2024)
Page(s): 4690 - 4701
Date of Publication: 05 October 2023

ISSN Information:

PubMed ID: 37796676

Funding Agency:

References is not available for this document.

I. Introduction

Adaptive dynamic programming (ADP) originated from dynamic programming [1] and reinforcement learning (RL) [2]. It is an efficient method for solving optimal control problems. Compared to the traditional approach of directly solving the Hamilton–Jacobi–Bellman (HJB) equation, ADP is applicable to systems with unknown models and is capable of handling the “curse of dimensionality” [3], [4]. This method has shown great potential in wastewater systems [5], power systems [6], [7], aerospace [8], cyber security [9], [10], and so forth. The utilization of function approximators is a pivotal element for the success of ADP. However, this component also presents some tricky problems, such as approximation errors. Currently, a widely used assumption is that function approximators have the perfect approximation performance [11], [12], but this rarely holds in nonlinear systems. If the approximation errors propagate throughout the iterative process, even small errors may trigger the “resonance” type phenomenon, which seriously affects the stability of the system. In particular, in some fields involving the safety of personal and property, such effects may cause horrific consequences. The related analysis is meaningful in ADP.

Select All
1.
R. E. Bellman, Dynamic Programming, Princeton, NJ, USA:Princeton Univ. Press, 1957.
2.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA:MIT Press, 2018.
3.
D. Wang, M. Ha and M. Zhao, "The intelligent critic framework for advanced optimal control", Artif. Intell. Rev., vol. 55, pp. 1-22, Jan. 2022.
4.
D. Liu, S. Xue, B. Zhao, B. Lou and Q. Wei, "Adaptive dynamic programming for control: A survey and recent advances", IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 1, pp. 142-160, Jan. 2021.
5.
D. Wang, M. Zhao, M. Ha and J. Qiao, "Intelligent optimal tracking with application verifications via discounted generalized value iteration", Acta Automatica Sinica, vol. 48, no. 1, pp. 182-193, 2022.
6.
X. Yang, Z. Zeng and Z. Gao, "Decentralized neurocontroller design with critic learning for nonlinear-interconnected systems", IEEE Trans. Cybern., vol. 52, no. 11, pp. 11672-11685, Nov. 2022.
7.
D. Wang, J. Ren, M. Ha and J. Qiao, "System stability of learning-based linear optimal control with general discounted value iteration", IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 6504-6514, Sep. 2023.
8.
D. Wang, L. Hu, M. Zhao and J. Qiao, "Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games", IEEE Trans. Syst. Man Cybern. Syst., vol. 53, no. 3, pp. 1584-1595, Mar. 2023.
9.
R. Liu, F. Hao and H. Yu, "Optimal SINR-based DoS attack scheduling for remote state estimation via adaptive dynamic programming approach", IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 12, pp. 7622-7632, Dec. 2021.
10.
Q. Cai, S. Alam and J. Liu, "On the robustness of complex systems with multipartitivity structures under node attacks", IEEE Trans. Control Netw. Syst., vol. 7, no. 1, pp. 106-117, Mar. 2020.
11.
Y. Du, B. Jiang, Y. Ma and Y. Cheng, "Robust ADP-based sliding-mode fault-tolerant control for nonlinear systems with application to spacecraft", Appl. Sci., vol. 12, no. 3, pp. 1673, 2022.
12.
G. Hu, J. Guo, Z. Guo, J. Cieslak and D. Henry, "ADP-based intelligent tracking algorithm for reentry vehicles subjected to model and state uncertainties", IEEE Trans. Ind. Informat., vol. 19, no. 4, pp. 6047-6055, Apr. 2023.
13.
R. Song and L. Zhu, "Optimal fixed-point tracking control for discrete-time nonlinear systems via ADP", IEEE/CAA J. Automatica Sinica, vol. 6, no. 3, pp. 657-666, May 2019.
14.
J. Xu, J. Wang, J. Rao, Y. Zhong and H. Wang, "Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function", Int. J. Robust Nonlinear Control, vol. 32, no. 6, pp. 3408-3424, 2022.
15.
Q. Wei, D. Liu and H. Lin, "Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems", IEEE Trans. Cybern., vol. 46, no. 3, pp. 840-853, Mar. 2016.
16.
X. Han et al., "Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles", Aerosp. Sci. Technol., vol. 106, Nov. 2020.
17.
M. Liang, D. Wang and D. Liu, "Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm", IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 11, pp. 3972-3985, Nov. 2020.
18.
B. Luo, Y. Yang, H.-N. Wu and T. Huang, "Balancing value iteration and policy iteration for discrete-time control", IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 11, pp. 3948-3958, Nov. 2020.
19.
A. Heydari, "Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy", IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4522-4527, Sep. 2018.
20.
M. Ha, D. Wang and D. Liu, "Offline and online adaptive critic control designs with stability guarantee through value iteration", IEEE Trans. Cybern., vol. 52, no. 12, pp. 13262-13274, Dec. 2022.
21.
D. Wang, M. Zhao, M. Ha and J. Qiao, "Stability and admissibility analysis for zero-sum games under general value iteration formulation", IEEE Trans. Neural Netw. Learn. Syst., Mar. 2022.
22.
Y. Fu, C. Hong, J. Fu and T. Chai, "Approximate optimal tracking control of nondifferentiable signals for a class of continuous-time nonlinear systems", IEEE Trans. Cybern., vol. 52, no. 6, pp. 4441-4450, Jun. 2022.
23.
Y. Yang, H. Modares, K. G. Vamvoudakis, W. He, C. Xu and D. C. Wunsch, "Hamiltonian-driven adaptive dynamic programming with approximation errors", IEEE Trans. Cybern., vol. 52, no. 12, pp. 13762-13773, Dec. 2022.
24.
L Buşoniu, T. D. Bruin, D. Tolić, J. Kober and I. Palunko, "Reinforcement learning for control: Performance stability and deep approximators", Annu. Rev. Control, vol. 46, pp. 8-28, Dec. 2018.
25.
R. Munos and C. Szepesvári, "Finite-time bounds for fitted value iteration", J. Mach. Learn. Res., vol. 9, pp. 815-857, May 2008.
26.
A. Farahmand, C. Szepesvári and R. Munos, "Error propagation for approximate policy and value iteration", Proc. Adv. Neural Inf. Process. Syst., vol. 23, pp. 1-14, 2010.
27.
M. Ha, D. Wang and D. Liu, "Discounted iterative adaptive critic designs with novel stability analysis for tracking control", IEEE/CAA J. Automatica Sinica, vol. 9, no. 7, pp. 1262-1272, Jul. 2022.
28.
D. Liu, H. Li and D. Wang, "Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems", IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1323-1334, Jun. 2015.
29.
A. Heydari, "Theoretical and numerical analysis of approximate dynamic programming with approximation errors", J. Guid. Control Dyn., vol. 39, no. 2, pp. 301-311, 2016.
30.
C. Kamanchi, R. B. Diddigi and S. Bhatnagar, "Generalized second-order value iteration in Markov decision processes", IEEE Trans. Autom. Control, vol. 67, no. 8, pp. 4241-4247, Aug. 2022.
Contact IEEE to Subscribe

References

References is not available for this document.