1. INTRODUCTION
Reinforcement learning (RL) [1] is a framework of machine learning in which an agent learns effective policies to accomplish a task by trial and error in an unknown environment. The agent obtains rewards through trial and error against the environment, and aims to identify a policy that maximizes the discounted cumulative reward.