Conferences >2021 IEEE/CVF Conference on C...

Learning Cross-Modal Retrieval with Noisy Labels

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recently, cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expe...Show More

Metadata

Abstract:

Recently, cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon’s Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multi-modal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maxi-mize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.

Published in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 20-25 June 2021

Date Added to IEEE Xplore: 02 November 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR46437.2021.00536

Conference Location: Nashville, TN, USA

No metrics found for this document.

Contents

1. Introduction

With rapid growth of multimedia data, cross-modal retrieval becomes a compelling topic in the multimodal learning community due to its flexibility in retrieving semantically relevant samples across distinct modalities, e.g., image query text [6], [16]. However, most existing methods require clean-annotated training data, which are expensive and time-consuming. Although some unsupervised multi-modal learning methods can mitigate such labeling pressure, their performance is usually much worse than the supervised counterparts’ [60]. To balance performance and labeling cost, semi-supervised multimodal learning methods are proposed to simultaneously utilize labeled and un-labeled data to learn common discriminative representations [61], [17]. However, semi-supervised approaches still require a certain number of clean-annotated data to reach reasonable performance.

Usage

Select a Year

View as

Total usage sinceNov 2021:345

Year Total:20

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

Learning Cross-Modal Retrieval with Noisy Labels

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Learning Cross-Modal Retrieval with Noisy Labels

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?