I. Introduction
Object detection, aiming to locate and classify various objects, is a core task in computer vision. After the success of Convolutional Neural Networks (CNNs) [1], plenty of object detection models [2], [3] achieve great progress on benchmark datasets [4], [5]. However, these models rely on the last layer of the backbone, which contains limited semantic features due to the feature map with low resolution.