I. INTRODUCTION
Simultaneous localization and mapping (SLAM) enable mobile robots incrementally build a map of unknown environment and estimate its own pose with the help of various sensors. Comparing with Light Detection and Ranging (LiDAR) based SLAM, visual SLAM (vSLAM) has become popular in recent years due to the richer texture information obtained by cameras at a lower cost. By leveraging this advantage, vSLAM may be integrated into various applications to enhance the overall performance of SLAM [1].