I. Introduction
Deep reinforcement learning has successfully revolutionized the process of solving decision-making problems in many areas ranging from robotics, for example, solving a Rubik’s cube or enabling autonomous driving, to mind games such as AlphaGo, Atari games, and Starcraft [1]–[6]. A recurrent problem of RL, however, is that it requires handcrafted reward functions that are tailored to individual tasks, which usually feature complex and as yet unknown behaviors in most real-world applications. Therefore, the design of a proper reward is challenging and a major impediment to the widespread adoption of RL for use in real-world applications.