I. Introduction
This paper is concerned with the application of approximate dynamic programming techniques (ADP) to find the value function of the DT HJB that appears in optimal control problems. ADP is an approach to solve dynamical programming problems utilizing function approximation. ADP was proposed by Werbos [12], Barto et. al. [7], Widrow et. al. [21], Howard [13], Watkins [10], Bertsekas and Tsitsiklis [17], and others as a way to solve optimal control problems forward-in-time. Therefore ADP combines adaptive critics, a reinforcement learning technique, with dynamic programming.