Loading web-font TeX/Main/Regular
Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting | IEEE Journals & Magazine | IEEE Xplore

Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting


Abstract:

Modern crowd counting methods in natural scenes, even when video datasets are available, are mostly based on images. Because of background interference or occlusion in th...Show More

Abstract:

Modern crowd counting methods in natural scenes, even when video datasets are available, are mostly based on images. Because of background interference or occlusion in the scene, these methods can easily lead to mutations and instability in density prediction. There has been minimal research on how to exploit the inherent consistency among adjacent frames to achieve high estimation accuracy of video sequences. In this study, we explore the long-term global temporal consistency in the video sequence and propose a novel Global Representation Guided Adaptive Fusion Network (GRGAF) for video crowd counting. The primary aim is to establish a long-term temporal representation among consecutive frames to guide the density estimation of local frames, which can alleviate the prediction instability caused by background noise and occlusions in crowd scenes. Moreover, in order to further enforce the temporal consistency, we apply the generative adversarial learning scheme and design a global-local joint loss, which can make the estimated density maps more temporally coherent. Extensive experiments on four challenging video-based crowd counting datasets (FDST, DroneCrowd, MALL and UCSD) demonstrate that our method makes effective use of spatio-temporal information of video and outperforms the other state-of-the-art approach.
Published in: IEEE Transactions on Multimedia ( Volume: 25)
Page(s): 5222 - 5233
Date of Publication: 07 July 2022

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Crowd counting is an important computer vision task because it facilitates a variety of fundamental applications, such as public safety management [1], automatic driving technologies [2], video surveillance [3], [4], and traffic management [5], [6]. The primary aim is to count the accurate number of people in a crowd scene from a video or image. Counting in diverse real-world scenarios remains challenging due to severe occlusion, large-scale variation and light illumination.

Select All
1.
M. Xu, "An efficient method of crowd aggregation computation in public areas", IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 10, pp. 2814-2825, Oct. 2018.
2.
W.-C. Lai et al., "Trajectory prediction in heterogeneous environment via attended ecology embedding", Proc. 28th ACM Int. Conf. Multimedia, pp. 202-210, 2020.
3.
Z. Zhang, M. Wang and X. Geng, "Crowd counting in public video surveillance by label distribution learning", Neurocomputing, vol. 166, pp. 151-163, 2015.
4.
S. Saxena, F. Brémond, M. Thonnat and R. Ma, "Crowd behavior recognition for video surveillance", Proc. Int. Conf. Adv. Concepts Intell. Vis. Syst., pp. 970-981, 2008.
5.
N. Ihaddadene and C. Djeraba, "Real-time crowd motion analysis", Proc. IEEE 19th Int. Conf. Pattern Recognit., pp. 1-4, 2008.
6.
Y.-J. Ma, H.-H. Shuai and W.-H. Cheng, "Spatiotemporal dilated convolution with uncertain matching for video-based crowd estimation", IEEE Trans. Multimedia, vol. 24, pp. 261-273, 2021.
7.
K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 770-778, 2016.
8.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", Proc. Int. Conf. Learn. Representations, pp. 1-14, 2015.
9.
A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks", Proc. Int. Conf. Neural Inf. Process. Syst., pp. 1097-1105, 2012.
10.
Z. Cong, H. Li, X. Wang and X. Yang, "Cross-scene crowd counting via deep convolutional neural networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 833-841, 2015.
11.
D. B. Sam, S. Surya and R. V. Babu, "Switching convolutional neural network for crowd counting", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 5744-5752, 2017.
12.
Y. Li, "A deep spatiotemporal perspective for understanding crowd behavior", IEEE Trans. Multimedia, vol. 20, no. 12, pp. 3289-3297, Dec. 2018.
13.
L. Boominathan, S. S. Kruthiventi and R. V. Babu, "CrowdNet: A deep convolutional network for dense crowd counting", Proc. 24th ACM Int. Conf. Multimedia, pp. 640-644, 2016.
14.
S. Huang et al., "Body structure aware deep crowd counting", IEEE Trans. Image Process., vol. 27, no. 3, pp. 1049-1059, Mar. 2018.
15.
G. He, Z. Ma, B. Huang, B. Sheng and Y. Yuan, "Dynamic region division for adaptive learning pedestrian counting", Proc. IEEE Int. Conf. Multimedia Expo, pp. 1120-1125, 2019.
16.
X. Cao, Z. Wang, Y. Zhao and F. Su, "Scale aggregation network for accurate and efficient crowd counting", Proc. Eur. Conf. Comput. Vis., pp. 757-773, 2018.
17.
V. A. Sindagi, R. Yasarla and V. M. Patel, "Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 1221-1231, 2019.
18.
Y. Xu et al., "Crowd counting with partial annotations in an image", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 15570-15579, 2021.
19.
M. Wang, H. Cai, X. Han, J. Zhou and M. Gong, "STNet: Scale tree network with multi-level auxiliator for crowd counting", IEEE Trans. Multimedia.
20.
X. Liu et al., "Exploiting sample correlation for crowd counting with multi-expert network", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 3215-3224, 2021.
21.
Q. Song et al., "Rethinking counting and localization in crowds: A purely point-based framework", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 3365-3374, 2021.
22.
J. Wan, Z. Liu and A. B. Chan, "A generalized loss function for crowd counting and localization", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 1974-1983, 2021.
23.
Z. Ma et al., "Learning to count via unbalanced optimal transport", Proc. AAAI Conf. Artif. Intell., pp. 2319-2327, 2021.
24.
F. Xiong, X. Shi and D.-Y. Yeung, "Spatiotemporal modeling for crowd counting in videos", Proc. IEEE Int. Conf. Comput. Vis., pp. 5151-5159, 2017.
25.
Z. Zou, H. Shao, X. Qu, W. Wei and P. Zhou, "Enhanced 3D convolutional networks for crowd counting", Proc. Brit. Mach. Vis. Conf., 2019.
26.
X. Wu, "Fast video crowd counting with a temporal aware network", Neuralcomputing, vol. 403, pp. 13-20, 2020.
27.
Y. Fang, B. Zhan, W. Cai, S. Gao and B. Hu, "Locality-constrained spatial transformer network for video crowd counting", Proc. IEEE Int. Conf. Multimedia Expo, pp. 814-819, 2019.
28.
Y. Fang et al., "Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting", Neurocomputing, vol. 392, pp. 98-107, 2020.
29.
W. Liu, K. M. Lis, M. Salzmann and P. Fua, "Geometric and physical constraints for drone-based head plane crowd density estimation", Proc. IEEE Int. Conf. Intell. Robots Syst., pp. 244-249, 2019.
30.
L. Wen et al., "Detection tracking and counting meets drones in crowds: A benchmark", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7812-7821, 2021.
Contact IEEE to Subscribe

References

References is not available for this document.