1. Introduction
Accurate position finding of persons attracts growing interest from both research and industrial communities, since it plays a crucial role in numerous location-sensitive application scenarios (e.g., surveillance, smart home, public health). Nevertheless, due to the line-of-sight (LOS) issue, GPS is unreliable in interior spaces and urban canyon. To overcome such limitation, various alternative solutions are investigated. Signal based solutions, including Bluetooth [13] and Wi-Fi [74], are popular, but they are easily interfered by changing environments and nearby human bodies [73]. A complementary stream of work is vision based; they typically make use of traditional cameras, RGBD cameras, or in-built smartphone cameras, and enjoy the advantage of reliable services. To get location information, visual positioning solutions usually refer to a pre-acquired 3D map or a geo-tagged database as the scene representation [63], or directly utilize the captured image to estimate the camera pose [39].