I. Introduction
Optimal control of nonlinear systems has been the focus of control fields for many decades [1]–[11]. Dynamic programming has been a useful technique in handling optimal control problems for many years, though it is often computationally untenable to perform it to obtain the optimal solutions [12]. Characterized by strong abilities of self-learning and adaptivity, adaptive dynamic programming (ADP), proposed by Werbos [13], [14], has demonstrated the capability to find the optimal control policy and solve the Hamilton–Jacobi–Bellman (HJB) equation in a practical way [15]–[20]. In [21]–[26], data-driven ADP algorithms were developed to design the optimal control, where mathematical models of control systems were necessary. In [27]–[29], hierarchical ADP with multiple-goal representation networks was investigated and the implementing efficiency of ADP was improved. Iterative methods are primary tools in ADP to obtain the solution of HJB equation indirectly and have received more and more attentions [30]–[37].