I. Introduction
Human detection is a fundamental problem in computer vision with many important applications, such as pedestrian safety, intelligent surveillance and human-machine interfaces. Human detection is one of the many beneficiaries of the progress made on CNNs. The CNN resembles the hierarchy of the nervous system with layer by layer abstraction of the image data to mine information [45]. The relatively small size of human detection datasets initially presented an obstacle to the use of CNNs. Sermanet et al. were the first to overcome this by using unsupervised pre-training [35]. Hosang et al. [15] successfully used the CifarNet model, making them the first to use a vanilla CNN model. Datasets such as the Caltech Pedestrian and KITTI have superseded earlier datasets like INRIA in size and level of annotation [49].