I. Introduction
The robustness of object detection models on aerial images is crucial for ensuring accurate and reliable results in real-world scenarios. In recent years, models based on deep neural networks have reached the state-of-the-art level [1], [2], [3], [4], constantly refreshing the highest evaluation scores on open competitions. Despite the high performance on the existing aerial object detection datasets, the reliability and applicability of those methods in actual usage remain to be examined.