I. Introduction
The availability of images with their geolocations coupled with additional information (e.g. text, time stamp, etc.), has led to many applications such as object geo-localization [1] and flood monitoring [2]. However, many images are lack of accurate (or any) GPS information - for instance tweets often include GPS of where people tweet but pictures posted in a tweet may have no GPS tag. In order to recover lost GPS information, Bulbul et al. proposed to query Google street view image database [3]. In general, the pipeline of recognizing a certain place using a single visual query has three successive steps. This pipeline can be applied to a large dataset such as a city scale. First, regions are located in the query image. Second, descriptors are generated over these selected regions in order to provide an accurate representation of the query image. Finally, this representation is matched over geotagged images in the reference dataset and the GPS information of the retrieved reference is then retrieved to the query image.