I. Introduction
Visual place recognition involves helping robots reduce pose drifts as well as accumulated errors in trajectory by determining whether the robot has been in the location previously. It also plays an important role in many mobile robotic applications, such as localizing in autonomous driving using a prior map [1], [2], maintaining the accuracy of simultaneous localization and mapping (SLAM) [3], [4], and relocalization combined with augmented reality/virtual reality (AR/VR) [5], [6]. In the last decade, significant progress has been made in the field of SLAM [7], [8] and visual place recognition (VPR) forms a key component in loop closure. However, place recognition in the presence of varying environmental conditions (such as changes in weather, illumination, and season) remains extremely challenging. Existing VPR methods that use handcrafted features to generate descriptors are often hampered by the difference in features across various image sets. Once this key issue is addressed, more accurate comparison among images will be achievable. This article concentrates on the problem of extracting invariant feature representations from image sets across varying and contrasting appearances for VPR.