Loading [MathJax]/extensions/MathZoom.js
CCTwins: A Weakly Supervised Transformer-Based Crowd Counting Method With Adaptive Scene Consistency Attention | IEEE Journals & Magazine | IEEE Xplore

CCTwins: A Weakly Supervised Transformer-Based Crowd Counting Method With Adaptive Scene Consistency Attention


Abstract:

Recently, crowd counting has attracted significant attention, particularly in the context of the COVID-19 pandemic, due to its ability to automatically provide accurate c...Show More

Abstract:

Recently, crowd counting has attracted significant attention, particularly in the context of the COVID-19 pandemic, due to its ability to automatically provide accurate crowd numbers in images. To address the challenges of location-level labeling, several transformer-based crowd counting methods have been proposed with only count-level supervision. However, these methods directly use the transformer as an encoder without considering the uneven crowd distribution. To address this issue, we propose CCTwins, a novel transformer-based crowd counting method with only count-level supervision. Specifically, we introduce an adaptive scene consistency attention mechanism to enhance the transformer-based model Twins-SVT-L for feature extraction in crowded scenes. Additionally, we design a multi-level weakly-supervised loss function that generates estimated crowd numbers in a coarse-to-fine manner, making it more appropriate for weakly-supervised settings. Moreover, intermediate features supervised by count-level labels are utilized to fuse multi-scale features. Experimental results on four public datasets demonstrate that our proposed method outperforms the state-of-the-art weakly-supervised methods, achieving up to a 16.6% improvement in MAE and up to a 13.8% improvement in RMSE across all evaluation settings. Moreover, the proposed CCTwins obtains competitive counting performance, even when compared to the state-of-the-art fully-supervised methods.
Published in: IEEE Transactions on Consumer Electronics ( Volume: 70, Issue: 1, February 2024)
Page(s): 22 - 35
Date of Publication: 09 May 2023

ISSN Information:

Funding Agency:

No metrics found for this document.

I. Introduction

The aim of crowd counting is to automatically estimate the crowd numbers in a given image or video. This task, regarded as a critical task of crowd analysis [1], has gained increasing attention in the deep learning community [2], [3], [4] because of its wide range of applications, including in smart cities for public safety [5], intelligence surveillance [6], [7], and traffic monitoring [8]. In particular, the crowd counting technique can be integrated into consumer electronic devices, such as smart glasses or drones, to provide security personnel with crowd distribution information, provide early warning of crowd stampedes, or detect crowd gatherings during the COVID-19 pandemic. The current state-of-the-art methods generally fall into two categories: density map-based methods [9], [10], [11], [12] and point-based methods [13], [14], [15]. Although those methods are well studied to handle various challenges in crowd counting (e.g., large-scale variation of people, occlusions and high clutter, or uneven crowd distributions), they all need point annotations in advance.

Usage
Select a Year
2025

View as

Total usage sinceMay 2023:1,315
01020304050JanFebMarAprMayJunJulAugSepOctNovDec462836000000000
Year Total:110
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.