I. Introduction
For multi-robot systems, one of the key research problems is to achieve cooperation among robots by decentralized (distributed) control methodology [1]. Normally, the desired cooperation is in the task level [2], which means that the common mission is broken down into tasks, and robots choose different tasks (roles) according to the state and behave differently. However, the design for the task level controller is quite difficult. Therefore, in recent years, machine learning techniques have been proposed and studied for multi-robot systems that aim to enable the robots to learn how to cooperate without the need for human design or coding.