I. Introduction
Accurate state estimation is a prerequisite in many robotic applications such as unmanned aerial vehicles (UAVs), unmanned ground vehicles (UGVs), and unmanned surface vessels (USVs). Visual-inertial navigation system (VINS) estimators have led the trend in state estimation in the past decade with impressive progress by the community [1]–[3]. However, existing VINS estimators suffer from problems caused by operating environments and sensor configuration, limiting their usage in real-world robotic applications. For stabilizing perceptions, current VINS estimators are tested in restricted environments and with specific camera mechanical configuration [4]. In outdoor experiments, VINS estimators are facing challenges such as overexposure, featureless frames, and tiny pixel parallax for faraway features, resulting in the loss of stable features tracking. The configuration of cameras on UGVs or USVs, facing the front, strengthen these negative impacts. In open water environments near the coast for USVs, reliable visual measurements are only gained from nearby static objects from the shore. Current VINS approaches drift easily when the USV is making turns or when cameras are facing the sea surface.