I. Introduction
Due to the rapid development of remote sensing technology, the data sources of remote sensing images (RSIs) have exploded in recent years, and high-resolution and high-quality RSIs have been generated in response. Object detection in the RSIs is a significant task of computer vision, which has great application value, especially for monitoring of ecological environment, construction of smart cities, and striking of military targets. However, there are the problems of complex background, arbitrary angle, small objects, etc. in the RSIs, which lead to the task of object detection in the RSIs is full of challenges. There is an imminent need to develop specific models to maximize the detection accuracy [1].