I. Introduction
Deep learning has demonstrated impressive performance in object detection [1], [2], [3], change detection [4], [5], semantic segmentation [6], [7], and other areas. Among these, remote sensing object detection not only aids in monitoring natural resource utilization and urban development but also serves as a crucial tool in disaster monitoring and environmental protection efforts, thereby garnering widespread attention. Accurate object detection heavily relies on large-scale training data with bounding-box annotations. However, the complexity and diversity of objects in remote sensing images, combined with factors such as diverse object scales and densely distributed objects, present significant challenges for manual annotation of bounding boxes (see Fig. 1).