I. Introduction
Over the years, inertial navigation systems (INS) [1, 2] have been widely used for estimating the 6DOF poses (positions and orientations) of sensing platforms (e.g., autonomous vehicles), in particular, in GPS-denied environments such as underwater, indoor, in the urban canyon, and on other planets. Most INS rely on a 6-axis inertial measurement unit (IMU) that measures the local linear acceleration and angular velocity of the platform to which it is rigidly connected. With the recent advancements of hardware design and manufacturing, low-cost light-weight micro-electro-mechanical (MEMS) IMUs have become ubiquitous [3–5], which enables high-accuracy localization for, among others, mobile devices [6] and micro aerial vehicles (MAVs) [7–11], holding huge implications in a wide range of emerging applications from mobile augmented reality (AR) [12, 13] and virtual reality (VR) [14] to autonomous driving [15, 16]. Unfortunately, simple integration of high-rate IMU measurements that are corrupted by noise and bias, often results in pose estimates unreliable for long-term navigation. Although a high-end tactical-grade IMU exists, it remains prohibitively expensive for widespread deployments. On the other hand, a camera that is small, light-weight, and energy-efficient, provides rich information about the environment and serves as an idea aiding source for INS, yielding visual-inertial navigation systems (VINS).