I. Introduction
Empowered by the self-learning ability of reinforcement learning (RL) and significantly improved environment perception, autonomous driving (AD) [1], [2], [3] is growing with fast pace and great potentials to improve driving safety and traffic efficiency [4], [5], [6]. Multi-vehicle pursuit (MVP) is a specific application of AD technology, where multiple autonomous pursuing vehicles chase one or more moving vehicles. MVP problems have been attracting extensive research attention due to their increasing applications, including collision avoidance designs in intelligent transportation systems, sport/game strategies, the balance and game between generators and loads in smart grid dispatch, disaster relief strategies, autonomous police vehicles pursuing suspects, and similar confrontation scenarios [7], [8]. The MVP tasks are usually mission and safety-critical. Efficient multi-vehicle collaboration and comprehensive perception under complex and dynamic traffic environments are important to successfully complete the MVP tasks [9].