Conferences >2023 IEEE/CVF International C...

BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Text-based person re-identification (TBPReID) aims to retrieve person images represented by a given textual query. In this task, how to effectively align images and texts...Show More

Metadata

Abstract:

Text-based person re-identification (TBPReID) aims to retrieve person images represented by a given textual query. In this task, how to effectively align images and texts globally and locally is a crucial challenge. Recent works have obtained high performances by solving Masked Language Modeling (MLM) to align image/text parts. However, they only performed uni-directional (i.e., from image to text) local-matching, leaving room for improvement by introducing opposite-directional (i.e., from text to image) localmatching. In this work, we introduce Bidirectional LocalMatching (BiLMa) framework that jointly optimize MLM and Masked Image Modeling (MIM) in TBPReID model training. With this framework, our model is trained so as the labels of randomly masked both image and text tokens are predicted by unmasked tokens. In addition, to narrow the semantic gap between image and text in MIM, we propose Semantic MIM (SemMIM), in which the labels of masked image tokens are automatically given by a state-of-the-art human parser. Experimental results demonstrate that our BiLMa framework with SemMIM achieves state-of-the-art Rank@1 and mAP scores on three benchmarks.

Published in: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

Date of Conference: 02-06 October 2023

Date Added to IEEE Xplore: 25 December 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCVW60793.2023.00295

Conference Location: Paris, France

Contents

1. Introduction

Text-based person re-identification (TBPReID) [11] aims to retrieve a target person from an image pool given a textual query. Since text queries are more user-friendly than image queries, TBPReID has been more and more expected to benefit various applications of surveillance and public safety. Existing literatures focus on how to align images and texts globally [23], [22] and/or locally [10], [4]. Particularly, recent works have demonstrated the importance of image-text local-matching [15], [20], and state-of-the-art (SOTA) methods [8], [12], [1] employ Masked Language Modeling (MLM) to align parts between image and text.

References is not available for this document.

BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?