1 Introduction
Model based control techniques have been developed in order to cope with control problems on the assumption that models of the controlled systems are known, the production equipment is becoming increasingly complicated, modeling a system is not easy, and sometimes it is impossible. It is a very meaningful to study non-model-based control methods for unknown discrete time control systems. Adaptive dynamic programming (ADP) [1]–[5] is a kind of intelligent control method, and it can directly approximate the optimal control policy via online learning. Heuristic dynamic programming (HDP), dual heuristic programming(DHP), action dependent heuristic dynamic programming(ADHDP), and action dependent dual heuristic programming (ADDHP) are four basic adaptive dynamic programming structures [6]. HDP is a typical ADP, it was proposed in the 1970s, and the idea was firmed up in the early 1990s under the names of adaptive critic designs.