I. Introduction
Object detection is one of the basic tasks in the field of computer vision, and the academic field has a research history of nearly two decades. With rapid development, algorithms based on deep learning emerged, such as R-CNN, Fast/Faster R-CNN, SSD and YOLO series which have better effect on target detection than traditional manual features way[1]. To improve confidence of detection, OverFeat feature extractor and multiscale sliding window in ConvNet[2] was proposed in 2013. Later, region average pooling was used on Fast R-CNN to better detect regional context[3] and Learning RoI Transformer was to Improve dense target detection without adding anchors [4].