I. Introduction
Slam, as one of the most fundamental modules, remains undoubtedly at the center of robotics research. After more than thirty years of development, SLAM has become a relatively mature research field with a wide range of applications. However, existing results have focused more on urban and indoor office scenes. Related research is still very challenging in extreme conditions, such as underground environments [1], [2]. The underground environments have some unfriendly characteristics for SLAM. First, the lighting conditions in the underground environments are poor, which brings significant challenges to the visual SLAM. Secondly, there are self-similar areas in underground environments, in which LiDAR SLAM is degenerate normally. Fortunately, despite these challenges, there has been some progress in recent years. The recent DARPA Subterranean (SubT) Challenge has promoted the development of underground SLAM [3]. A series of loosely coupled multi-robot SLAM algorithms has been developed based on LiDAR and IMU, supplemented by visual and thermal vision. It indicates that multi-sensor fusion is a feasible solution for underground space detection. However, most works are loosely coupled methods. In contrast, tightly coupled methods have higher robustness due to the fusion of more aspects of sensor information [4].