The Basics of Learning Control
A key question in learning control is what it is that should be learned. To address this issue, it is helpful to begin with one of the most general frameworks of learning control, as originally developed in the middle of the 20th century in the fields of optimization theory, optimal control, and in particular, dynamic programming [9], [10]. Here, the goal of learning control was formalized as the need to acquire a task-dependent control policy that maps a continuous-valued state vector of a controlled system and its environment, possibly in a time dependent way, to a continuous-valued control vector : {\bf{u}} = \pi \left({{\bf{x}}, t, \theta}\right) .\eqno (1)