Conferences >2019 Eleventh International C...

Cross-modal Retrieval Based on Stacked Bimodal Auto-Encoder

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Deep learning (DL) has achieved excellent results in dealing with various types of single-modal problems, and many researchers have applied DL in the field of cross-modal...Show More

Metadata

Abstract:

Deep learning (DL) has achieved excellent results in dealing with various types of single-modal problems, and many researchers have applied DL in the field of cross-modal retrieval, in which the popular approaches are based on two-stage learning. The first stage obtains a separate representation of each modality, and the second stage is responsible for learning the inter-modal correlation, which is the key to retrieval. The traditional solutions to the second stage obtain the inter-modal shared representation through a shallow network structure, which cannot effectively learn the inter-modal multi-level correlation. Motivated by this, in this paper, a novel hybrid deep structure combining two stages was presented for cross-modal retrieval task. To learn the inter-modal correlation, we incorporated a Stacked Bimodal Auto-Encoder (Stacked-BAE) to the third layer. On the one hand, our model introduced Stacked-BAE to learn the rich cross-modal correlation further and enhanced learning ability of the model. On the other hand, we utilized the layer-wise learning method to obtain the inter-modal multi-level correlation, which improves the accuracy of cross-modal retrieval. Extensive experiments on several cross-modal datasets show that our model is superior to the baseline correlation analysis and three common multi-modal deep models regarding cross-modal retrieval tasks.

Published in: 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)

Date of Conference: 07-09 June 2019

Date Added to IEEE Xplore: 29 July 2019

ISBN Information:

Electronic ISSN: 2573-3311

DOI: 10.1109/ICACI.2019.8778590

Conference Location: Guilin, China

No metrics found for this document.

Contents

I. Introduction

With the development of the Internet, multi-modal data such as image, text, audio, and video on the Internet have been rapidly increasing. The Cross-modal retrieval integrating image, text and other modal type has become a research hotspot in the field of multimedia information retrieval. Different from the traditional single modal information retrieval task, the multi-modal retrieval task realizes that users can submit one modal data and receive results containing multi-modal information. For example, if a user visits the Buckingham Palace and uses an image search, the results will be Buckingham Palace with images, text, and video.

Usage

Select a Year

View as

Total usage sinceJul 2019:145

Year Total:3

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Search for
Citations in
Google Scholar^®

References is not available for this document.

Cross-modal Retrieval Based on Stacked Bimodal Auto-Encoder

Abstract:

Metadata

Abstract:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cross-modal Retrieval Based on Stacked Bimodal Auto-Encoder

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Keywords

Metrics

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?