Journals & Magazines >IEEE Signal Processing Letters >Volume: 32

Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowd...Show More

Metadata

Abstract:

Pedestrian detection in crowded situation is a challenging task. This study presents a straightforward and effective method called Det RCNN to detect pedestrians in crowded situation, while also pairing the body and head of individual pedestrian. On the one hand, pedestrians' heads have their characteristics of stable shape and distinct feature. On the other hand, their heads are usually positioned higher in image, so even in crowded situation, it is difficult to completely cover the pedestrians' heads. Therefore, this study equipped the DETR model with a Head Decoder (HDecoder) parallel to the Decoder. HDecoder takes the head knowledge generated in the Decoder phase as head queries. Simultaneously, the HDecoder uses a key-query mechanism to search the entire image for the body bounding boxes corresponding to the head queries. Lastly, the proposed method conducts a straightforward IOU (Intersection over Union) matching between the body bounding boxes produced in the Decoder and HDecoder phases. This HDecoder resembles the second stage of the Faster RCNN model, hence this paper termed it Det RCNN (DETR RCNN). Compared to Deformable DETR, the experimental results on the CrowdHuman dataset show that the proposed model can increase AP

$_{m}$ from 53.02 to 53.87. Furthermore, the mMR

$^{-2}$ decreased from 52.46 to 42.32 compared to the existing BFJ.

Published in: IEEE Signal Processing Letters ( Volume: 32)

Page(s): 576 - 580

Date of Publication: 02 January 2025

ISSN Information:

DOI: 10.1109/LSP.2024.3525397

Funding Agency:

Contents

I. Introduction

For a long time, pedestrian detection and associating it with their own head have been very popular research topics in computer vision. These studies have been widely applied in many fields such as robotics and security monitoring. In the past decade, with the rapid development of computational power in computers, deep learning methods based on Convolutional Neural Networks (CNN) [1] have been extensively applied to this task. Currently, mainstream methods are divided into two categories: anchor-based [2], [3], [4], [5], [6] and anchor-free [7], [8], [9], [10]. Recently, Nicolas Carion introduced Transformer [11] into object detection tasks and proposed DETR [12]. Following this, numerous methods based on DETR [13], [14], [15], [16], [17], [18], [19], [20] have proliferated like mushrooms, becoming a third mainstream methods.

References is not available for this document.

MIT Libraries

MIT Libraries

Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Detecting Pedestrian With Incomplete Head Feature in Crowded Situation Based on Transformer

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References