I. Introduction
Object detection is a significant task in remote-sensing images (RSIs) [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. With the rapid development of sensor technology, more data is available for training the detector. Driven by this, the object detection task has greatly advanced in recent years. However, as the amount of data increases, there is a growing demand for manually labeled annotations. The labeling process is laborious because subtle annotations for each instance are required in the common fully supervised object detection tasks. To alleviate the problem, a weakly supervised paradigm [13], [14], [15], [16], [17], [18], where only image-level categorical labels are needed, is expansively leveraged to perform the detection task. The paradigm is called weakly supervised object detection (WSOD).