I. Introduction
Simultaneous localization and mapping (SLAM) is a foundational capability required for many mobile robot navigation tasks. During the last two decades, great success has been achieved using SLAM for real-time state estimation and mapping in challenging settings with a single perceptual sensor, such as a lidar or camera. Lidar-based methods can capture the fine details of the environment at long range. However, such methods typically fail when operating in structure-less environments, such as a long corridor or a flat open field. Though vision-based methods are especially suitable for place recognition and perform well in texture-rich environments, their performance is sensitive to illumination change, rapid motion, and initialization. Therefore, lidar-based and vision-based methods are each often coupled with an inertial measurement unit (IMU) to increase their respective robustness and accuracy. A lidar-inertial system can help correct point cloud distortion and account for a lack of features over short periods of time. Metric scale and attitudes can be recovered by IMU measurements to assist a visual-inertial system. To further improve system performance, the fusion of lidar, camera, and IMU measurements is attracting increased attention.