Loading web-font TeX/Math/Italic
Iterative Q-learning-based nonlinear optimal tracking control | IEEE Conference Publication | IEEE Xplore

Iterative Q-learning-based nonlinear optimal tracking control


Abstract:

A new Q-learning algorithm is developed for a class of discrete-time nonlinear systems in this paper to solve the infinite horizon optimal tracking problems. Using system...Show More

Abstract:

A new Q-learning algorithm is developed for a class of discrete-time nonlinear systems in this paper to solve the infinite horizon optimal tracking problems. Using system transformations, the optimal tracking problem is transformed to be an optimal regulation problem. Thereafter, for the regulation system, the new Q-learning algorithm is developed in order to obtain the optimal control law. Convergence of the iterative Q functions and the admissibility of the iterative control law are analyzed. In the end, two corresponding simulation examples are presented to illustrate the performance of the newly developed algorithm.
Date of Conference: 06-09 December 2016
Date Added to IEEE Xplore: 13 February 2017
ISBN Information:
Conference Location: Athens, Greece

I. Introduction

In the past several decades, the optimal control problems especially for nonlinear systems have always been the focus in the control field [6]. As is known to all, dynamic programming is a very useful tool while solving the optimal control problems. Nevertheless, considering the “curse of dimensionality”, when trying to obtain the optimal solution, it is very likely to be computationally untenable to perform dynamic programming. Correspondingly, The adaptive dynamic programming (ADP) algorithm was proposed by [1], [2] as a solution to optimal control problems in a forward-in-time way. Policy and value iterations are two primary iterative ADP algorithms [3]. In [4], policy iteration algorithms are firstly used for optimal control of continuous-time (CT) systems, which have continuous states and action spaces. In [5], the optimal control law for multiple actor-critic structures was effectively obtained using shunting inhibitory artificial neural network (SIANN). Policy iteration for zero-sum and non-zero-sum games was discussed in [7]–[9]. In [10], the multi-agent optimal control was obtained using fuzzy approximation structures. In [11], while solving problems of discrete-time (DT) nonlinear systems, policy iteration algorithm was developed. Thereafter, value iteration algorithm was presented for the optimal control problems of discrete-time nonlinear systems in [12]. For deterministic discrete-time affine nonlinear systems, [13] studied the value iteration algorithm. It was proven that the iterative value function is non-decreasing and bounded, and hence converges to the optimum as the iteration index increases to infinity. In [6], [14], [15], value iteration algorithms with approximation errors were analyzed. Based on the framework of policy and value iteration algorithms, more investigations on iterative ADP algorithms have been developed [16]–[32].

Contact IEEE to Subscribe

References

References is not available for this document.