1. Introduction
CNN-based approaches [18] have dominated object detection tasks [20], [32] for years. In these methods, a common component is the backbone network [12], [13], [14], [35], acting as extracting image features by a series of convolution and pooling layers. Modern CNN-based detectors [9], [11], [27], [21], [36], [29], [23], [25], [26], [22] regard the detector design as a modules combination process, which always composed of a backbone, a neck [21] and multiple detection heads [3]. Among which, the backbone has become a de facto standard to improve the performance and the design of various backbones is also a focus of research in the field of object detection. As we all know, the equipment of a backbone is essential for existing CNN-based detectors.