Journals & Magazines >IEEE Transactions on Multimedia >Volume: 25

Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Modern crowd counting methods in natural scenes, even when video datasets are available, are mostly based on images. Because of background interference or occlusion in th...Show More

Metadata

Abstract:

Modern crowd counting methods in natural scenes, even when video datasets are available, are mostly based on images. Because of background interference or occlusion in the scene, these methods can easily lead to mutations and instability in density prediction. There has been minimal research on how to exploit the inherent consistency among adjacent frames to achieve high estimation accuracy of video sequences. In this study, we explore the long-term global temporal consistency in the video sequence and propose a novel Global Representation Guided Adaptive Fusion Network (GRGAF) for video crowd counting. The primary aim is to establish a long-term temporal representation among consecutive frames to guide the density estimation of local frames, which can alleviate the prediction instability caused by background noise and occlusions in crowd scenes. Moreover, in order to further enforce the temporal consistency, we apply the generative adversarial learning scheme and design a global-local joint loss, which can make the estimated density maps more temporally coherent. Extensive experiments on four challenging video-based crowd counting datasets (FDST, DroneCrowd, MALL and UCSD) demonstrate that our method makes effective use of spatio-temporal information of video and outperforms the other state-of-the-art approach.

Published in: IEEE Transactions on Multimedia ( Volume: 25)

Page(s): 5222 - 5233

Date of Publication: 07 July 2022

ISSN Information:

DOI: 10.1109/TMM.2022.3189246

Funding Agency:

References is not available for this document.

Contents

I. Introduction

Crowd counting is an important computer vision task because it facilitates a variety of fundamental applications, such as public safety management [1], automatic driving technologies [2], video surveillance [3], [4], and traffic management [5], [6]. The primary aim is to count the accurate number of people in a crowd scene from a video or image. Counting in diverse real-world scenarios remains challenging due to severe occlusion, large-scale variation and light illumination.

References is not available for this document.

Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?