I. Introduction
Simultaneous localization and mapping (SLAM) techniques have been evolving and widely applied to advanced driver assistance systems (ADAS) and autonomous driving systems. While SLAM approaches using light detection and ranging (LiDAR) sensors are accurate, the cost of LiDAR sensors is high and they have not been widely used in commercial products. Visual SLAM systems that use camera(s) are a popular alternative to LiDAR-based SLAM. Monocular SLAM systems that use a single camera are attractive as they are cheap and easy to install. Monocular SLAM was initially suggested with filter-based approaches [1], [2], [3], [4]. The filter-based methods are computationally inefficient, since both localization and mapping run on every frame [5]. To resolve the issue of filter-based methods, keyframe-based approaches [6], [7], [8] (see other references in [5]) run the mapping process only on selective frames, called keyframes, while the localization process estimates a camera pose in every frame. The keyframe-based SLAM improved the localization accuracy and computational efficiency of filter-based methods [9], and became the de facto standard in monocular SLAM [5].