Loading [MathJax]/extensions/MathZoom.js
Crowd Counting Using Scale-Aware Attention Networks | IEEE Conference Publication | IEEE Xplore

Crowd Counting Using Scale-Aware Attention Networks


Abstract:

In this paper, we consider the problem of crowd counting in images. Given an image of a crowded scene, our goal is to estimate the density map of this image, where each p...Show More

Abstract:

In this paper, we consider the problem of crowd counting in images. Given an image of a crowded scene, our goal is to estimate the density map of this image, where each pixel value in the density map corresponds to the crowd density at the corresponding location in the image. Given the estimated density map, the final crowd count can be obtained by summing over all values in the density map. One challenge of crowd counting is the scale variation in images. In this work, we propose a novel scale-aware attention network to address this challenge. Using the attention mechanism popular in recent deep learning architectures, our model can automatically focus on certain global and local scales appropriate for the image. By combining these global and local scale attentions, our model outperforms other state-of-the-art methods for crowd counting on several benchmark datasets.
Date of Conference: 07-11 January 2019
Date Added to IEEE Xplore: 07 March 2019
ISBN Information:
Print on Demand(PoD) ISSN: 1550-5790
Conference Location: Waikoloa, HI, USA
References is not available for this document.

1. Introduction

We consider the problem of crowd counting in arbitrary static images. Given an arbitrary image of a crowded scene without any prior knowledge about the scene (e.g. camera position, scene layout, crowd density), our goal is to estimate the density map of the input image, where each pixel value in the density map corresponds to the crowd density at the corresponding location of the input image. The crowd count can be obtained by integrating the entire density map. In particular, we focus on the setting where the training data have dotted annotations, i.e. each object instance (e.g. people) is annotated with a single point in the image.

Select All
1.
C. Arteta, V. Lempitsky and A. Zisserman, "Counting in the wild" in European Conference on Computer Vision, Springer, 2016.
2.
R. Benenson, M. Omran, J. Hosang and B. Schiele, "Ten years of pedestrian detection what have we learned?" in European Conference on Computer Vision, Springer, 2014.
3.
L. Boominathan, S. S. Kruthiventi and R. V. Babu, "Crowd-net: A deep convolutional network for dense crowd counting" in Proceedings of the 2016 ACM on Multimedia Conference, ACM, 2016.
4.
K. Chen, C. C. Loy, S. Gong and T. Xiang, "Feature mining for localised crowd counting", British Machine Vision Conference, 2012.
5.
L.-C. Chen, Y. Yang, J. Wang, W. Xu and A. L. Yuille, "Attention to scale: Scale-aware semantic image segmentation", IEEE Conference on Computer Vision and Pattern Recognition, 2016.
6.
J. Dai, Y. Li, K. He and J. Sun, "R-fcn: Object detection via region-based fully convolutional networks" in Advances in neural Information Processing Systems, 2016.
7.
H. Idrees, I. Saleemi, C. Seibert and M. Shah, "Multi-source multi-scale counting in extremely dense crowd images" in IEEE Conferece on Computer Vision and Pattern Recognition, 2013.
8.
S. Kumagai, K. Hotta and T. Kurita, "Mixture of counting cnns: Adaptive integration of cnns specialized to specific appearance for crowd counting" in arXiv preprint, 2017.
9.
V. Lempitsky and A. Zisserman, "Learning to count objects in images" in Advances in Neural Information Processing Systems, 2010.
10.
J. Liu, C. Gao, D. Meng and A. G. Hauptmann, "Decidenet: Counting varying density crowds through attention guided detection and density estimation" in IEEE Conferece on Computer Vision and Pattern Recognition, 2018.
11.
C. C. Loy, K. Chen, S. Gong and T. Xiang, "Crowd counting and profiling: Methodology and evaluation" in Modeling Simulation and Visual Analysis of Crowds, Springer, 2013.
12.
M. Marsden, K. McGuiness, S. Little and N. E. O'Connor, "Fully convolutional crowd counting on highly congested scenes" in arXiv preprint, 2016.
13.
V. Mnih, N. Heess, A. Graves et al., "Recurrent models of visual attention" in Advances in Neural Information Processing Systems, 2014.
14.
D. Onoro-Rubio and R.J. Lopez-Sastre, "Towards perspective-free object counting with deep learning", Eu-ropean Conference on Computer vision, 2016.
15.
V.-Q. Pham, T. Kozakaya, O. Yamaguchi and R. Okada, "Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation", Proceedings of the IEEE International Conference on Computer Vision, 2015.
16.
S. Ren, K. He, R. Girshick and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks" in Advances in Neural Information Processing Systems, 2015.
17.
D. B. Sam, S. Surya and R. V. Babu, "Switching convolutional neural networks for crowd counting", IEEE Conference on Computer Vision and Pattern Recognition, 2017.
18.
B. Sheng, C. Shen, G. Lin, J. Li, W. Yang and C. Sun, "Crowd counting via weighted vlad on dense attribute feature maps" in IEEE Transactions on Circuitsand Systems for Video Technology, 2016.
19.
V. A. Sindagi and V. M. Patel, "CNN-based cascaded multitask learning of high-level prior and density estimation for crowd counting", IEEE International Conference on Advanced Video and Signal Based Surveillance, 2017.
20.
V. A. Sindagi and V. M. Patel, "Generating high-quality crowd density maps using contextual pyramid CNNs", IEEE International Conference on Computer Vision, 2017.
21.
I. S. Topkaya, H. Erdogan and F. Porikli, "Counting people by clustering person detector outputs" in Advanced Video and Signal Based Surveillance (AVSS) 2014 11th IEEE International Conference on, IEEE, 2014.
22.
E. Walach and L. Wolf, "Learning to count with cnn boosting" in European Conference on Computer Vision, Springer, 2016.
23.
Y. Wang and Y. Zou, "Fast visual object counting via example-based density estimation" in Image Processing (ICIP) 2016 IEEE International Conference on, IEEE, 2016.
24.
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdi-nov, et al., "Show attend and tell: Neural image caption generation with visual attention", International Conference on Machine Learning, 2015.
25.
C. Zhang, H. Li, X. Wang and X. Yang, "Cross-scene crowd counting via deep convolutional neural networks", IEEE Conference on Computer Vision and Pattern Recognition, 2015.
26.
H. Zhang, Z. Kyaw, S.-F. Chang and T.-S. Chua, "Visual translation embedding network for visual relation detection", IEEE Conference on Computer Vision and Pattern Recognition, 2017.
27.
S. Zhang, G. Wu, J. P. Costeira and J. M. F. Moura, "FCN-rLSTM: Deep spatial-temporal neural networks for vehicle counting in city cameras", IEEE International Conference on Computer Vision, 2017.
28.
Y. Zhang, D. Zhou, S. Chen, S. Gao and Y. Ma, "Single-image crowd counting via multi-column convolutional neural network", IEEE Conference on Computer Vision and Pattern Recognition, 2016.
Contact IEEE to Subscribe

References

References is not available for this document.