I. INTRODUCTION
The technology of estimating the robot states and simultaneously modeling the environments is the well-known Simultaneous Localization and Mapping (SLAM). In general, the sensors that are usually used in real-world applications can be divided into two types: proprioceptive and exteroceptive sensors [1]. Inertial Measurement Unit (IMU) is a typical proprioceptive sensor, as it can capture the motion of the robot without any information from the surroundings. But the rapid drifting is their biggest drawback. On the other hand, the exteroceptive sensors, such as 3D LiDAR, provide accurate range information of the surroundings. However, motion distortion and sparsity of the point clouds result in difficulty in feature extraction and association tasks. Especially, the LiDAR-only methods suffer from degenerated cases in structure-less scenarios. Thus, the fusion of these two types of sensors is vital to provide robust and accurate results of SLAM in real applications.