Loading [MathJax]/extensions/MathMenu.js
PHNet: Parasite-Host Network for Video Crowd Counting | IEEE Conference Publication | IEEE Xplore

PHNet: Parasite-Host Network for Video Crowd Counting


Abstract:

Crowd counting plays an increasingly important role in public security. Recently, many crowd counting methods for a single image have been proposed but few studies have f...Show More

Abstract:

Crowd counting plays an increasingly important role in public security. Recently, many crowd counting methods for a single image have been proposed but few studies have focused on using temporal information from image sequences of videos to improve prediction performance. In the existing methods using videos for crowd estimation, temporal features and spatial features are modeled jointly for the prediction, which makes the model less efficient in extracting spatiotemporal features and difficult to improve the performance of predictions. In order to solve these problems, this paper proposes a Parasite-Host Network (PHNet) which is composed of Parasite branch and Host branch to extract temporal features and spatial features respectively. To specifically extract the transform features in the time domain, we propose a novel architecture termed as “Relational Extractor”(RE) which models the multiplicative interaction features of adjacent frames. In addition, the Host branch extracts the spatial features from a current frame which can be replaced with any model that uses a single image for the prediction. We conducted experiments by using our PHNet on four video crowd counting benchmarks: Venice, UCSD, FDST and CrowdFlow. Experimental results show that PHNet achieves superior performance on these four datasets to the state-of-the-art methods.
Date of Conference: 10-15 January 2021
Date Added to IEEE Xplore: 05 May 2021
ISBN Information:
Print on Demand(PoD) ISSN: 1051-4651
Conference Location: Milan, Italy
No metrics found for this document.

I. Introduction

Crowd counting problem aims to get the accurate number of people through images or videos, which is important for applications such as video surveillance, traffic monitoring, public safety, and urban planning. In all practical applications, the main source of data are videos captured by drones or surveillance cameras. Data in the form of video can be naturally decomposed into temporary part and spatial part. However, most of the crowd density estimation models only use the spatial information of a video and ignore the strong correlation between adjacent video frames. The methods for processing a single image can be roughly divided into two categories: detection-based methods[1]–[3] and regression-based methods[4]. The latter solves the occlusion and chaos problems of the former by using CNN-based models such as MCNN and CSRNet[5], [6].

Usage
Select a Year
2025

View as

Total usage sinceMay 2021:263
01234JanFebMarAprMayJunJulAugSepOctNovDec132000000000
Year Total:6
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.