Loading [MathJax]/extensions/MathMenu.js
Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs | IEEE Conference Publication | IEEE Xplore

Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs


Abstract:

We present a novel method called Contextual Pyramid CNN (CP-CNN) for generating high-quality crowd density and count estimation by explicitly incorporating global and loc...Show More

Abstract:

We present a novel method called Contextual Pyramid CNN (CP-CNN) for generating high-quality crowd density and count estimation by explicitly incorporating global and local contextual information of crowd images. The proposed CP-CNN consists of four modules: Global Context Estimator (GCE), Local Context Estimator (LCE), Density Map Estimator (DME) and a Fusion-CNN (F-CNN). GCE is a VGG-16 based CNN that encodes global context and it is trained to classify input images into different density classes, whereas LCE is another CNN that encodes local context information and it is trained to perform patch-wise classification of input images into different density classes. DME is a multi-column architecture-based CNN that aims to generate high-dimensional feature maps from the input image which are fused with the contextual information estimated by GCE and LCE using F-CNN. To generate high resolution and high-quality density maps, F-CNN uses a set of convolutional and fractionally-strided convolutional layers and it is trained along with the DME in an end-to-end fashion using a combination of adversarial loss and pixellevel Euclidean loss. Extensive experiments on highly challenging datasets show that the proposed method achieves significant improvements over the state-of-the-art methods.
Date of Conference: 22-29 October 2017
Date Added to IEEE Xplore: 25 December 2017
ISBN Information:
Electronic ISSN: 2380-7504
Conference Location: Venice, Italy
Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA
Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA

1. Introduction

With ubiquitous usage of surveillance cameras and advances in computer vision, crowd scene analysis [18], [43] has gained a lot of interest in the recent years. In this paper, we focus on the task of estimating crowd count and high-quality density maps which has wide applications in video surveillance [15], [41], traffic monitoring, public safety, urban planning [43], scene understanding and flow monitoring. Also, the methods developed for crowd counting can be extended to counting tasks in other fields such as cell microscopy [38], [36], [16], [6], vehicle counting [23], [49], [48], [11], [34], environmental survey [8], [43], etc. The task of crowd counting and density estimation has seen a significant progress in the recent years. However, due to the presence of various complexities such as occlusions, high clutter, non-uniform distribution of people, non-uniform illumination, intra-scene and inter-scene variations in appearance, scale and perspective, the resulting accuracies are far from optimal.

Density estimation results. Top left: Input image (from the ShanghaiTech dataset [50]). Top right: Ground truth. Bottom left: Zhang et al. [50] (PSNR: 22.7 dB SSIM: 0.68). Bottom right: CP-CNN (PSNR: 26.8 dB SSIM: 0.91).

Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA
Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA
Contact IEEE to Subscribe

References

References is not available for this document.