I. Introduction
The evolution of next-generation mobility services and driving automation is gradually propelling modern transportation systems towards intelligence [1]. With cellular vehicle-to-everything (C-V2X) support, connected and automated vehicles (CAVs) have emerged as a pivotal component in the traffic participant network. By sharing environmental information, CAVs gain the ability to see through obstructions and expand their field of vision, obtaining a more comprehensive understanding of road conditions. This can significantly benefit subsequent driving behavioral, contributing to road safety and commuting efficiency. Consequently, cooperative perception (CP) based on multiple CAVs has rapidly garnered attention from the research community, especially in object detection and tracking. However, most researchers only concentrate on the detection aspect, while the exploration of leveraging the advantages of CP to enhance object tracking is still in its infancy.