I. Introduction
In autonomous driving, it’s a key issue to get the depth of the scene and its localization to construct the map. Lasers bring more accurate information [1], [2], but at the same time, it increases costs and needs calibration [3]. Using only inexpensive vision sensors can get dense information, which is also closer to the way people perceive information when driving. However, the traditional visual SLAM method relies heavily on artificial design features [4]. It is also not robust enough for changes in the environment, and it is easy to lose features in dynamic environments and fail outdoors.