I. Introduction
Object detection in very high resolution (VHR) optical remote sensing images is a fundamental problem faced for aerial and satellite image analysis. In recent years, due to the advance of the machine learning technique, particularly the powerful feature representations and classifiers, many approaches regard object detection as a classification problem and have shown impressive success for some specific object detection tasks [1]– [24]. In these approaches, object detection can be performed by learning a classifier, such as support vector machine (SVM) [1], [7], [8], [12], [13], [20]– [24], AdaBoost [2]– [5], -nearest neighbors [15], [17], conditional random field [6], [19], and sparse-coding-based classifier [9]– [11], [14], [16], which captures the variation in object appearances and views from a set of training data in a supervised [2]– [7], [9]– [14], [16]– [21], [23], [24] or semisupervised [15], [22] or weakly supervised framework [1], [8], [25], [51]. The input of the classifier is a set of image regions with their corresponding feature representations, and the output is their predicted labels, i.e., object or not. A recent review on object detection in optical remote sensing images can be found in [26].