1. Introduction
Visual place recognition (VPR) is an important task of computer vision, and a fundamental building block of navigation systems for autonomous vehicles [24], [48]. It is approached either with structure-based methods, namely Structure-from-Motion [36] and SLAM [26], or with image retrieval [2], [15], [20], [29], [30], [46]. The former focus on precise relative camera pose estimation [34], [35]. The latter aim at learning image descriptors for effective retrieval of similar images to a given query in a nearest search approach [28]. The goal of descriptor learning is to ensure images of the same place to be projected onto close-by points in a latent space, and images of different places to be projected onto distant points [9], [10], [21]. Contrastive [19], [30] and triplet [2], [2] [2], [23], [27] loss were used for this goal and resulted in state-of-the-art performance on several VPR benchmarks.
(a) A place in the city of amman. (b) An image taken 6m away is labeled as positive (same place), while (c) an image taken 25.6m away is labeled as negative (not the same place) despite sharing a lot of visual cues.