I. Introduction
During the recent decades, artificial intelligence and learning control algorithms have received significantly increasing attention and become one key issue of programmers and engineers from commercial and research fields [1]–[3]. Along with the environmental interactions in operational strategies, reinforcement learning (RL) technique can calculate the optimized policy with respect to the performance evaluation, and overcome the crucial problem, curse of dimensionality, in the traditional dynamic programming theory [4]–[6]. Based on the superiority and essence of RL methods, the adaptive dynamic programming (ADP) or integral RL technique is developed and widely applied in optimal control designs [7]–[10], which maps the relationship of control policies and performance index. However, it is still on preliminary stage and more considerable attention should be paid to implementing the RL techniques into challenging problems from engineering applications.