Glance to Count: Learning to Rank with Anchors for Weakly-supervised Crowd Counting | IEEE Conference Publication | IEEE Xplore

Glance to Count: Learning to Rank with Anchors for Weakly-supervised Crowd Counting


Abstract:

Crowd image is arguably one of the most laborious data to annotate. In this paper, we aim to reduce the massive demand for densely labeled crowd data, and propose a novel...Show More

Abstract:

Crowd image is arguably one of the most laborious data to annotate. In this paper, we aim to reduce the massive demand for densely labeled crowd data, and propose a novel weakly-supervised setting, in which we leverage the binary ranking of two images with high-contrast crowd counts as training guidance. To enable training under this new setting, we convert the crowd count regression problem to a ranking potential prediction problem. In particular, we tailor a Siamese Ranking Network that predicts the potential scores of two images indicating the ordering of the counts. Hence, the ultimate goal is to assign appropriate potentials for all the crowd images to ensure their orderings obey the ranking labels. On the other hand, potentials reveal the relative crowd sizes but cannot yield an exact crowd count. We resolve this problem by introducing "anchors" during the inference stage. Concretely, anchors are a few images with count labels used for referencing the corresponding counts from potential scores by a simple linear mapping function. We conduct extensive experiments to study various combinations of supervision, and we show that our method outperforms existing weakly-supervised methods by a large margin without additional labeling effort. The code is available at https://github.com/pandaszzzzz/CCRanking.
Date of Conference: 03-08 January 2024
Date Added to IEEE Xplore: 09 April 2024
ISBN Information:

ISSN Information:

Conference Location: Waikoloa, HI, USA

Funding Agency:

Description

Description not available.
Review our Supplemental Items documentation for more information.

1. Introduction

Crowd counting aims to automatically count the number of individuals in images and has been widely applied in many areas, e.g., video surveillance, traffic estimation, and congestion control. Most recent approaches [58], [59], [5], [19] rely mainly on fully-supervised annotation for individuals in the crowd (i.e., placing a dot at the center of each individual) to estimate crowd density. Yet, such an annotation process is extremely time-consuming and laborious. Especially for extremely dense scenarios, it is almost senseless to manually label over-heaped dots just for the purpose of representing crowd density in a scene. Such a tedious annotation process hinders the scale and diversity of crowd datasets and thus slows down the development of this area.

Description

Description not available.
Review our Supplemental Items documentation for more information.
Contact IEEE to Subscribe

References

References is not available for this document.