Conferences >2023 IEEE International Confe...

Scene Graph based Fusion Network for Image-Text Retrieval

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A critical challenge to image-text retrieval is how to learn accurate correspondences between images and texts. Most existing methods mainly focus on coarse-grained corre...Show More

Metadata

Abstract:

A critical challenge to image-text retrieval is how to learn accurate correspondences between images and texts. Most existing methods mainly focus on coarse-grained correspondences based on co-occurrences of semantic objects, while failing to distinguish the fine-grained local correspondences. In this paper, we propose a novel Scene Graph based Fusion Network (dubbed SGFN), which enhances the images’/texts’ features through intra-and cross-modal fusion for image-text retrieval. To be specific, we design an intra-modal hierarchical attention fusion to incorporate semantic contexts, such as objects, attributes, and relationships, into images’/texts’ feature vectors via scene graphs, and a cross-modal attention fusion to combine the contextual semantics and local fusion via contextual vectors. Extensive experiments on public datasets Flickr30K and MSCOCO show that our SGFN performs better than quite a few SOTA image-text retrieval methods.

Published in: 2023 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 10-14 July 2023

Date Added to IEEE Xplore: 25 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME55011.2023.00032

Conference Location: Brisbane, Australia

Funding Agency:

Contents

I. Introduction

Image-text retrieval is one of the fundamental tasks in the field of vision and language [6]. Its goal is to effectively retrieve the most similar samples to its content from the database of image (text) modality given a query of text (image) modality. The biggest challenge is to narrow the semantic gap between cross-modal data for accurate similarity of image-text pairs.

References is not available for this document.

Scene Graph based Fusion Network for Image-Text Retrieval

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Scene Graph based Fusion Network for Image-Text Retrieval

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References