I. Introduction
With burgeoning population and rapid urbanization, crowd gatherings have become more prominent in the recent years. Consequently, computer vision-based crowd analytics and surveillance [5 , 10 , 18 , 19 , 27 , 28 , 34 , 37 , 38 , 44 , 46 , 57 , 59 , 61 , 63] have received increased interest. Furthermore, algorithms developed for the purpose of crowd analytics have found applications in other fields such as agriculture monitoring [26] , microscopic biology [16] , urban planning and environmental survey [8 , 57] . Current state-of-the-art counting networks achieve impressive error rates on a variety of datasets that contain numerous challenges. Their success can be broadly attributed to two major factors: (i) design of novel convolutional neural network (CNN) architectures specifically for improving count performance [4 , 29 , 33 , 36 , 38 , 43 , 50 , 59] , and (ii) development and publication of challenging datasets [10 , 11 , 59 , 61] . In this paper, we consider both of the above factors in an attempt to further improve the crowd counting performance.