I. Introduction
Multi-agent navigation is a promising issue and has wide real-world applications such as cargo transportation, traffic scheduling and autonomous driving [1]. However, safe navigation from the initial location to the destination is always challenging due to the high density of agents, partial observation, high-dimensional state space, nonlinear dynamics, etc. Moreover, in a multi-agent system (MAS), we take the overall system efficiency and security as the main criterion rather than only considering the single-agent tasks. In other words, for each individual, a reasonable policy is required in some cases to make altruistic behavior in the shared environment to maximize total returns, rather than just greedily obtaining private rewards.