1 Introduction
With burgeoning population and rapid urbanization, crowd gatherings have become more prominent in the recent years. Consequently, computer vision-based crowd analytics and surveillance [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17] have received increased interest. Furthermore, algorithms developed for the purpose of crowd analytics have found applications in other fields such as agriculture monitoring [18], microscopic biology [19], urban planning and environmental survey [2], [20]. Current state-of-the-art counting networks achieve impressive error rates on a variety of datasets that contain numerous challenges. Their success can be broadly attributed to two major factors: (i) development and publication of challenging datasets [3], [4], [5], [21], and (ii) design of novel convolutional neural network (CNN) architectures specifically for improving count performance [4], [7], [22], [23], [24], [25], [26], [27]. In this paper, we consider both of the above factors in an attempt to further improve the crowd counting performance.