I. Introduction
Reinforcement learning (RL) [38] is an important branch in machine learning theory. It is concerned with how an agent should modify its actions based on the reward from its reactive unknown environment so as to achieve a long-term goal. Originally observed from biological learning behavior, RL has been brought to the computer science and control science literature as a way to study artificial intelligence in the 1960s [28], [29], [45]. Since then, numerous contributions to RL have been made; see [4], [39], and [47].