I. Introduction
Simultaneous Localization and Mapping (SLAM) enables a robot to create a map of its environment and estimate its pose simultaneously. Multiple Object Tracking (MOT) detects and tracks multiple targets in video sequences, maintaining their identities. In early works, SLAM and MOT were treated as independent problems. However, SLAM typically assumes a static scene, viewing moving objects as noise. Mobile objects and dynamic environments are unavoidable. As a result, in highly dynamic environments, the classic SLAM framework is more susceptible to noise interference, leading to ego-motion estimation failure. For MOT, most research involves sensors on a fixed platform to monitor environmental changes. When sensors are mounted on a mobile platform, accurate localization is essential for successful target tracking. As a result, combining SLAM with MOT ensures SLAM is not affected by moving objects, while precise SLAM pose estimation enhances MOT's ability to track reliably. Hence, SLAM and MOT are mutually beneficial [1].