1. Introduction
Camera relocalization serves as the subroutine of applications including SLAM [15], augmented reality [9] and autonomous navigation [45]. It estimates the 6-DoF pose of a query RGB image in a known scene coordinate system. Current relocalization approaches mostly focus on one-shot relocalization for a still image. They can be mainly categorized into three classes [13], [50]: (1) the relative pose regression (RPR) methods which determine the relative pose w.r.t. the database images [3], [29], (2) the absolute pose regression (APR) methods regressing the absolute pose through PoseNet [25] and its variants [23], [24], [60] and (3) the structure-based methods that establish 2D-3D correspondences with Active Search [48], [49] or Scene Coordinate Regression (SCoRe) [52] and then solve the pose by PnP algorithms [18], [42]. Particularly, SCoRe is widely adopted recently to learn per-pixel scene coordinates from dense training data for a scene, due to its ability to form dense and accurate 2D-3D matches even in texture-less scenes [5], [6]. As extensively evaluated in [5], [6], [50], the structure-based methods generally show better pose accuracy than the RPR and APR methods, because they explicitly exploit the rules of the projective geometry and the scene structures [50].