I. Introduction
Adaptive dynamic programming (ADP) originated from dynamic programming [1] and reinforcement learning (RL) [2]. It is an efficient method for solving optimal control problems. Compared to the traditional approach of directly solving the Hamilton–Jacobi–Bellman (HJB) equation, ADP is applicable to systems with unknown models and is capable of handling the “curse of dimensionality” [3], [4]. This method has shown great potential in wastewater systems [5], power systems [6], [7], aerospace [8], cyber security [9], [10], and so forth. The utilization of function approximators is a pivotal element for the success of ADP. However, this component also presents some tricky problems, such as approximation errors. Currently, a widely used assumption is that function approximators have the perfect approximation performance [11], [12], but this rarely holds in nonlinear systems. If the approximation errors propagate throughout the iterative process, even small errors may trigger the “resonance” type phenomenon, which seriously affects the stability of the system. In particular, in some fields involving the safety of personal and property, such effects may cause horrific consequences. The related analysis is meaningful in ADP.