I. Introduction
Object detection is one of the crucial tasks in computer vision. In the past few years, the performance of obj ect detection [1]–[14] has dramatically improved due to the success of deep convolutional neural networks (CNN). Typically, object detection and recognition involve two steps: first, deep neural networks are used to localize the potential location of each target object; then, objects are classified into appropriate classes. If the first step can effectively localize the potential object, the second step will be easier. Even though the two-step approach achieved state-of-the-art performance, the running times are usually slow [11]. Therefore, one-stage detectors have been developed to improve the speed,