1. Introduction
With ubiquitous usage of surveillance cameras and advances in computer vision, crowd scene analysis [18], [43] has gained a lot of interest in the recent years. In this paper, we focus on the task of estimating crowd count and high-quality density maps which has wide applications in video surveillance [15], [41], traffic monitoring, public safety, urban planning [43], scene understanding and flow monitoring. Also, the methods developed for crowd counting can be extended to counting tasks in other fields such as cell microscopy [38], [36], [16], [6], vehicle counting [23], [49], [48], [11], [34], environmental survey [8], [43], etc. The task of crowd counting and density estimation has seen a significant progress in the recent years. However, due to the presence of various complexities such as occlusions, high clutter, non-uniform distribution of people, non-uniform illumination, intra-scene and inter-scene variations in appearance, scale and perspective, the resulting accuracies are far from optimal.
Density estimation results. Top left: Input image (from the ShanghaiTech dataset [50]). Top right: Ground truth. Bottom left: Zhang et al. [50] (PSNR: 22.7 dB SSIM: 0.68). Bottom right: CP-CNN (PSNR: 26.8 dB SSIM: 0.91).