STNet: Scale Tree Network With Multi-Level Auxiliator for Crowd Counting | IEEE Journals & Magazine | IEEE Xplore

STNet: Scale Tree Network With Multi-Level Auxiliator for Crowd Counting


Abstract:

State-of-the-art approaches for crowd counting resort to deepneural networks to predict density maps. However, counting people in congested scenes remains a challenging t...Show More

Abstract:

State-of-the-art approaches for crowd counting resort to deepneural networks to predict density maps. However, counting people in congested scenes remains a challenging task because the presence of drastic scale variation, density inconsistency, and complex background can seriously degrade their counting accuracy. To battle the ingrained issue of accuracy degradation, in this paper, we propose a novel and powerful network called Scale Tree Network (STNet) for accurate crowd counting. STNet consists of two key components: a Scale-Tree Diversity Enhancer and a Multi-level Auxiliator. Specifically, the Diversity Enhancer is designed to enrich scale diversity, which alleviates limitations of existing methods caused by insufficient level of scales. A novel tree structure is adopted to hierarchically parse coarse-to-fine crowd regions. Furthermore, a simple yet effective Multi-level Auxiliator is presented to aid in exploiting generalisable shared characteristics at multiple levels, allowing more accurate pixel-wise background cognition. The overall STNet is trained in an end-to-end manner, without the needs for manually tuning loss weights between the main and the auxiliary tasks. Extensive experiments on five challenging crowd datasets demonstrate the superiority of the proposed method.
Published in: IEEE Transactions on Multimedia ( Volume: 25)
Page(s): 2074 - 2084
Date of Publication: 13 January 2022

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Crowd counting recently has drawn lots of attention from researchers due to its great importance in a wide array of real-world applications including video surveillance, public crowd monitoring, and traffic control. The main objective of crowd counting is to infer the number of people in congested images. Despite the exploration of pioneer works [1]–[6], crowd understanding is still a challenging issue for scenes exhibiting drastic scale variations, density inconsistency, or complex background.

Select All
1.
Y. Zhang, D. Zhou, S. Chen, S. Gao and Y. Ma, "Single-image crowd counting via multi-column convolutional neural network", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 589-597, 2016.
2.
D. B. Sam, S. Surya and R. V. Babu, "Switching convolutional neural network for crowd counting", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 4031-4039, 2017.
3.
X. Cao, Z. Wang, Y. Zhao and F. Su, "Scale aggregation network for accurate and efficient crowd counting", Proc. Eur. Conf. Comput. Vis., pp. 734-750, 2018.
4.
Y. Li, X. Zhang and D. Chen, "CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1091-1100, 2018.
5.
W. Liu, M. Salzmann and P. Fua, "Context-aware crowd counting", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 5099-5108, 2019.
6.
X. Jiang et al., "Attention scaling for crowd counting", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4706-4715, 2020.
7.
V. A. Sindagi and V. M. Patel, "Generating high-quality crowd density maps using contextual pyramid CNNs", Proc. IEEE Int. Conf. Comput. Vis., pp. 1861-1870, 2017.
8.
L. Liu et al., "Crowd counting with deep structured scale integration network", Proc. IEEE Int. Conf. Comput. Vis., pp. 1774-1783, 2019.
9.
V. A. Sindagi and V. M. Patel, "Multi-level bottom-top and top-bottom feature fusion for crowd counting", Proc. IEEE Int. Conf. Comput. Vis., pp. 1002-1012, 2019.
10.
X. Liu, J. Yang and W. Ding, "Adaptive mixture regression network with local counting map for crowd counting", Proc. Comput. Vis. 16th European Conf., pp. 241-257, 2020.
11.
S. Bai et al., "Adaptive dilated network with self-correction supervision for counting", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4594-4603, 2020.
12.
Y. Shen, S. Tan, A. Sordoni and A. Courville, "Ordered neurons: Integrating tree structures into recurrent neural networks", Proc. Int. Conf. Learn. Representations, pp. 1-14, 2019.
13.
H. Xiong et al., "From open set to closed set: Counting objects by spatial divide-and-conquer", Proc. IEEE Int. Conf. Comput. Vis., pp. 8362-8371, 2019.
14.
M. Zhao, J. Zhang, C. Zhang and W. Zhang, "Leveraging heterogeneous auxiliary tasks to assist crowd counting", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 12736-12745, 2019.
15.
B.-B. Gao, C. Xing, C.-W. Xie, J. Wu and X. Geng, "Deep label distribution learning with label ambiguity", IEEE Trans. Image Process., vol. 26, no. 6, pp. 2825-2838, Jun. 2017.
16.
Y. Miao, Z. Lin, G. Ding and J. Han, "Shallow feature based dense attention network for crowd counting", Proc. Assoc. Advance. Artif. Intell., pp. 11765-11772, 2020.
17.
N. Liu et al., "ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3225-3234, 2019.
18.
D. Li and Q. Chen, "Dynamic hierarchical mimicking towards consistent optimization objectives", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7642-7651, 2020.
19.
S. Liu, E. Johns and A. J. Davison, "End-to-end multi-task learning with attention", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1871-1880, 2019.
20.
H. Idrees et al., "Composition loss for counting density map estimation and localization in dense crowds", Proc. Eur. Conf. Comput. Vis., pp. 532-546, 2018.
21.
H. Idrees, I. Saleemi, C. Seibert and M. Shah, "Multi-source multi-scale counting in extremely dense crowd images", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2547-2554, 2013.
22.
V. A. Sindagi, R. Yasarla and V. M. Patel, "JHU-CROWD++: Large-scale crowd counting dataset and a benchmark method", IEEE Trans. Pattern Anal. Mach. Intell., 2020.
23.
T. Zhao and R. Nevatia, "Bayesian human segmentation in crowded situations", Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2, pp. 459-466, 2003.
24.
O. Sidla, Y. Lypetskyy, N. Brandle and S. Seer, "Pedestrian detection and tracking for counting applications in crowded situations", Proc. IEEE Int. Conf. Video Signal Based Surveill., pp. 70-70, 2006.
25.
M. Li, Z. Zhang, K. Huang and T. Tan, "Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection", Proc. 19th Int. Conf. Pattern Recognit., pp. 1-4, 2008.
26.
V. B. Subburaman, A. Descamps and C. Carincotte, "Counting people in the crowd using a generic head detector", Proc. IEEE 9th Int. Conf. Adv. Video Signal-Based Surveill., pp. 470-475, 2012.
27.
A. B. Chan, Z.-S. J. Liang and N. Vasconcelos, "Privacy preserving crowd monitoring: Counting people without people models or tracking", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1-7, 2008.
28.
K. Chen, S. Gong, T. Xiang and C. C. Loy, "Cumulative attribute space for age and crowd density estimation", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2467-2474, 2013.
29.
K. Chen, C. C. Loy, S. Gong and T. Xiang, "Feature mining for localised crowd counting", Proc. Brit. Mach. Vis. Conf., vol. 1, pp. 3, 2012.
30.
V. Lempitsky and A. Zisserman, "Learning to count objects in images", Proc. Adv. Neural Inf. Process. Syst., pp. 1324-1332, 2010.
Contact IEEE to Subscribe

References

References is not available for this document.