I. Introduction
The goal of long-term metric localization is to estimate the 6-DoF pose of a robot with respect to a visual map. However, long-term localization remains a challenge across drastic appearance change caused by illumination variations, such as day-night scenarios. Traditional point-based localization approaches find correspondences between local features extracted from images by applying hand-crafted descriptors (e.g., SIFT [1], SURF [2], ORB [3]), then recover the full 6-DoF camera pose. However, such hand-crafted features are not robust under extreme appearance changes due to low repeatability. To address this, experience-based visual navigation methods [4]–[6] proposed to store intermediate experiences to achieve long-term localization. For instance, Multi-Experience Visual Teach & Repeat [5], [6] retrieves the most relevant experiences to perform SURF feature matching during a more challenging repeat to bridge the appearance gap and localize to the initial taught path.