I. Introduction
Pedestrian detection serves as a fundamental building block for various multimedia applications such as surveillance systems and autonomous driving. Driven by the success of general object detection, many current pedestrian detectors [1], [2], [3], [4], [5], built on the basic practice such as Faster R-CNN [6] and SSD [7], have shown good performance on most pedestrian detection scenes. However, with the increasing demand for accuracy in autonomous driving, their performance in handling challenging pedestrian detection scenarios, particularly cases involving occlusion or significant scale variation, is currently unsatisfactory.