I. Introduction
With the rapid advancement of 6G communication networks [1], the application of ground mobile robots in emergency rescue, industrial automation, and smart living has been signif-icantly accelerated. Central to these robots' autonomous tasks are localization and mapping, which demand high accuracy and low drift. To meet these requirements, technologies such as LEGO-LOAM [2], FAST-LIO2 [3], and VINS-FUSION [4] leverage Lidar and vision-based SLAM solutions. Despite the high accuracy of visual SLAM, it is susceptible to environ-mental factors like low light, rapid illumination changes, and fast movements, which can degrade performance. Additionally, the high computational demand of visual SLAM can hinder real-time operation in complex environments.