1. INTRODUCTION
Multi-Agent Reinforcement Learning (MARL) has achieved remarkable success in a range of challenging sequential decision-making tasks, such as wireless edge caching [1], multi-player strategy games [2], and so on. As an underexplored issue in MARL, communication is a key component for multi-agent coordination. Agents can exchange their local observations via communication messages. These messages are aggregated and further utilized to augment individual local observations for selecting actions.