I. Introduction
Camera ego-motion estimation or visual odometry (VO) has drawn great attention of the researchers due to its crucial role in many visual tasks, such as VR/AR, 3D modeling, visual simultaneous localization and mapping (vSLAM), etc [1], [2]. It takes a sequence of consecutive images (or video frames) recorded by a moving camera as input, and recovers the relative movement of camera (ego-motion), which is usually parameterized as 6-DoF camera pose (translation and rotation) [3], [4]. In the last two decades, traditional methods [5], [6], based on the multi-view geometry theories, have shown their impressive performance in well-conditioned environments. However, their performances and robustness are easily affected by the textureless regions or the low image-quality. Moreover, traditional methods are typically computationally expensive because of the complicated optimization.