I. Introduction
Optimal control theory has attracted a lot of attention in recent decades [1]–[4]. Especially, the optimal control of nonlinear systems is always a difficult problem. Different from solving the optimal control of linear systems, solving the optimal control problem of nonlinear systems often needs to solve the Hamilton–Jacobi–Bellman (HJB) equation, which does not have a closed-form solution [5], [6]. To handle this difficulty, a promising adaptive optimal method has been presented based on reinforcement learning (RL) method, namely, adaptive dynamic programming (ADP) [7], [8], in which an appropriate RL system was designed to approximate adaptively to the HJB equation (see [9]–[11] and the references therein for details). Many ADP methods were developed in recent years [12]–[26]. For example, in [12], an ADP algorithm was presented for discrete-time (DT) systems, the value function and control law were updated by state-dependent learning rate. In [18], an ADP controller was proposed for continuous-time systems, the controller was obtained by an initial stable control law and iterative process. In [19], a policy iteration method was presented for DT systems. The convergence and stability of the optimal controller were discussed. In [20], an optimal controller was proposed for general unknown nonlinear systems to handle tracking control problem. First, to identify the unknown dynamics, a recurrent neural network (NN) was adopted as a data-driven model. Then, the approximation of the optimal cost function and the optimal control strategy were obtained by two NNs, respectively. In [21], the optimal control problem for DT nonlinear systems was studied. Back propagation (BP) NNs were used in the algorithm and bounded approximation error could be obtained. In [22], a data-based iteration optimal controller was proposed for the coal gasification tracking control problem, three NNs were adopted to generate the approximation of the dynamics of the coal gasification process, the coal quality, and the reference control, respectively. In [23], the optimal tracking control problem was investigated. The approximation of the optimal value function, the optimal control strategy, and the error dynamics were obtained by three NNs, respectively. In [24], an online ADP control scheme was proposed by using three NNs. An approximator based on NN was adopted to approximate the unknown dynamics and two NNs were adopted to generate the approximation of the optimal cost function and optimal control law, respectively. In these researches, NNs plays an important role in ADP algorithms, the control performance of ADP algorithms is influenced by the property of NNs, and the NN with the better property may improve the performance of ADP algorithms.