I. Introduction
In the past several decades, the optimal control problems especially for nonlinear systems have always been the focus in the control field [6]. As is known to all, dynamic programming is a very useful tool while solving the optimal control problems. Nevertheless, considering the “curse of dimensionality”, when trying to obtain the optimal solution, it is very likely to be computationally untenable to perform dynamic programming. Correspondingly, The adaptive dynamic programming (ADP) algorithm was proposed by [1], [2] as a solution to optimal control problems in a forward-in-time way. Policy and value iterations are two primary iterative ADP algorithms [3]. In [4], policy iteration algorithms are firstly used for optimal control of continuous-time (CT) systems, which have continuous states and action spaces. In [5], the optimal control law for multiple actor-critic structures was effectively obtained using shunting inhibitory artificial neural network (SIANN). Policy iteration for zero-sum and non-zero-sum games was discussed in [7]–[9]. In [10], the multi-agent optimal control was obtained using fuzzy approximation structures. In [11], while solving problems of discrete-time (DT) nonlinear systems, policy iteration algorithm was developed. Thereafter, value iteration algorithm was presented for the optimal control problems of discrete-time nonlinear systems in [12]. For deterministic discrete-time affine nonlinear systems, [13] studied the value iteration algorithm. It was proven that the iterative value function is non-decreasing and bounded, and hence converges to the optimum as the iteration index increases to infinity. In [6], [14], [15], value iteration algorithms with approximation errors were analyzed. Based on the framework of policy and value iteration algorithms, more investigations on iterative ADP algorithms have been developed [16]–[32].