1. Introduction
The detection of geospatial objects is a fundamental and challenging problem for analyzing and understanding remote sensing images. Recently, deep learning has improved the performance significantly in various image processing techniques [1–6], and object detectors with convolutional neural network (CNN) have achieved the state-of-the-art accuracy. However, these successful detecting methods are difficult to exhibit their excellent performance on remote sensing images, since there are obvious differences between remote sensing images and natural scenes.