Loading [MathJax]/extensions/MathMenu.js
Cross-modal Retrieval Based on Stacked Bimodal Auto-Encoder | IEEE Conference Publication | IEEE Xplore

Cross-modal Retrieval Based on Stacked Bimodal Auto-Encoder


Abstract:

Deep learning (DL) has achieved excellent results in dealing with various types of single-modal problems, and many researchers have applied DL in the field of cross-modal...Show More

Abstract:

Deep learning (DL) has achieved excellent results in dealing with various types of single-modal problems, and many researchers have applied DL in the field of cross-modal retrieval, in which the popular approaches are based on two-stage learning. The first stage obtains a separate representation of each modality, and the second stage is responsible for learning the inter-modal correlation, which is the key to retrieval. The traditional solutions to the second stage obtain the inter-modal shared representation through a shallow network structure, which cannot effectively learn the inter-modal multi-level correlation. Motivated by this, in this paper, a novel hybrid deep structure combining two stages was presented for cross-modal retrieval task. To learn the inter-modal correlation, we incorporated a Stacked Bimodal Auto-Encoder (Stacked-BAE) to the third layer. On the one hand, our model introduced Stacked-BAE to learn the rich cross-modal correlation further and enhanced learning ability of the model. On the other hand, we utilized the layer-wise learning method to obtain the inter-modal multi-level correlation, which improves the accuracy of cross-modal retrieval. Extensive experiments on several cross-modal datasets show that our model is superior to the baseline correlation analysis and three common multi-modal deep models regarding cross-modal retrieval tasks.
Date of Conference: 07-09 June 2019
Date Added to IEEE Xplore: 29 July 2019
ISBN Information:
Electronic ISSN: 2573-3311
Conference Location: Guilin, China
No metrics found for this document.

I. Introduction

With the development of the Internet, multi-modal data such as image, text, audio, and video on the Internet have been rapidly increasing. The Cross-modal retrieval integrating image, text and other modal type has become a research hotspot in the field of multimedia information retrieval. Different from the traditional single modal information retrieval task, the multi-modal retrieval task realizes that users can submit one modal data and receive results containing multi-modal information. For example, if a user visits the Buckingham Palace and uses an image search, the results will be Buckingham Palace with images, text, and video.

Usage
Select a Year
2025

View as

Total usage sinceJul 2019:145
01234JanFebMarAprMayJunJulAugSepOctNovDec030000000000
Year Total:3
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.