I. Introduction
Dynamic programming (DP) dates back to the mid 20th century. In [4], Bellman described optimality principles and the structure of DP with paticular application to stochastic descision processes. The core of DP is the Hamilton-Jacobi- Bellman (HJB) equation which is a partial differential equation describing the behavior of the optimal cost function, sometimes called cost-to-go due to the infinite horizon over which an optimal control policy is sought. In general, when applying DP, one has to face the problem of discretizing the state space and computing the cost function and optimal policies at the discrete points. Only in some very special cases, like the linear dynamics case, analytic solutions can be found. In the linear case with a quadratic running cost, the corresponding optimal controller is the linear quadratic regu- lator (LQR) which can be found by solving the corresponding algebraic Riccati equation [16].