Loading [MathJax]/extensions/MathMenu.js
Multi-Graph Based Hierarchical Semantic Fusion for Cross-Modal Representation | IEEE Conference Publication | IEEE Xplore

Multi-Graph Based Hierarchical Semantic Fusion for Cross-Modal Representation


Abstract:

The main challenge of cross-modal retrieval is how to efficiently realize semantic alignment and reduce the heterogeneity gap. However, existing approaches ignore the mul...Show More

Abstract:

The main challenge of cross-modal retrieval is how to efficiently realize semantic alignment and reduce the heterogeneity gap. However, existing approaches ignore the multi-grained semantic knowledge learning from different modalities. To this end, this paper proposes a novel end-to-end cross-modal representation method, termed as Multi-Graph based Hierarchical Semantic Fusion (MG-HSF). This method is an integration of multi-graph hierarchical semantic fusion with cross-modal adversarial learning, which captures fine-grained and coarse-grained semantic knowledge from cross-modal samples, and generate modalities-invariant representations in a common subspace. To evaluate the performance, extensive experiments are conducted on three benchmarks. The experimental results show that our method is superior than the state-of-the-arts.
Date of Conference: 05-09 July 2021
Date Added to IEEE Xplore: 09 June 2021
ISBN Information:

ISSN Information:

Conference Location: Shenzhen, China

Funding Agency:

References is not available for this document.

1. Introduction

With the advent of big data era, the amount of multi-modal data (e.g., image, text, audio, video, 3D model) in the Internet is growing explosively. This trend brings unprecedented challenges of accurate and efficient cross-modal retrieval [1]. As a hot-spot in the area of multimedia, cross-modal retrieval aims to find out objects of different modalities according to a query of a specific modality. This technology can be applied in many scenarios, such as multimedia search, recommendation system, VQA, etc.

Select All
1.
Yang Wang, "Survey on deep multi-modal data analytics: Collaboration rivalry and fusion", 2020.
2.
David R Hardoon, Sandor Szedmak and John Shawe-Taylor, "Canonical correlation analysis: An overview with application to learning methods", Neural computation, vol. 16, no. 12, pp. 2639-2664, 2004.
3.
David M Blei, Andrew Y Ng and Michael I Jordan, "Latent dirichlet allocation", Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.
4.
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, et al., "A new approach to cross-modal multimedia retrieval", Proceedings of the 18th ACM international conference on Multimedia, pp. 251-260, 2010.
5.
Cheng Jin, Wenhui Mao, Ruiqi Zhang, Yuejie Zhang and Xiangyang Xue, "Cross-modal image clustering via canonical correlation analysis", Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 151-159, 2015.
6.
Viresh Ranjan, Nikhil Rasiwasia and CV Jawahar, "Multi-label cross-modal retrieval", Proceedings of the IEEE International Conference on Computer Vision, pp. 4094-4102, 2015.
7.
David M Blei and Michael I Jordan, "Modeling annotated data", Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 127-134, 2003.
8.
Jing Yu, Yonghui Cong, Zengchang Qin and Tao Wan, "Cross-modal topic correlations for multimedia retrieval", Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 246-249, 2012.
9.
Lin Wu, Yang Wang and Ling Shao, "Cycle-consistent deep generative hashing for cross-modal retrieval", IEEE Trans. Image Process., vol. 28, no. 4, pp. 1602-1612, 2019.
10.
Alex Krizhevsky, Ilya Sutskever and Geoffrey E Hinton, "Imagenet classification with deep convolutional neural networks", NIPS, 2012.
11.
Lei Zhu, Jiayu Song, Xiangxiang Wei, Hao Yu and Jun Long, "Caesar: concept augmentation based semantic representation for cross-modal retrieval", Multimedia Tools and Applications, pp. 1-31, 2020.
12.
Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang and Shirui Pan, "Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering", IJCAI 2016, pp. 2153-2159.
13.
Yunchao Gong, Qifa Ke, Michael Isard and Svetlana Lazebnik, "A multi-view embedding space for modeling internet images tags and their semantics", International journal of computer vision, vol. 106, no. 2, pp. 210-233, 2014.
14.
Yang Wang, Xuemin Lin, Lin Wu and Wenjie Zhang, "Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval", IEEE Transactions on Image Processing, vol. 26, no. 3, pp. 1393-1404, 2017.
15.
Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, et al., "Cross-modal retrieval with cnn visual features: A new baseline", IEEE transactions on cybernetics, vol. 47, no. 2, pp. 449-460, 2016.
16.
Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao and Xinbo Gao, "Pairwise relationship guided deep hashing for cross-modal retrieval", Thirty-first AAAI conference on artificial intelligence, 2017.
17.
Yale Song and Mohammad Soleymani, "Polysemous visual-semantic embedding for cross-modal retrieval", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1979-1988, 2019.
18.
Xuanwu Liu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Yazhou Ren and Maozu Guo, "Ranking-based deep cross-modal hashing", Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4400-4407, 2019.
19.
Lu Jin, Kai Li, Hao Hu, Guo-Jun Qi and Jinhui Tang, "Semantic neighbor graph hashing for multimodal retrieval", IEEE Trans. Image Process., vol. 27, no. 3, pp. 1405-1417, 2018.
20.
Ruiqing Xu, Chao Li, Junchi Yan, Cheng Deng and Xianglong Liu, "Graph convolutional network hashing for cross-modal retrieval", IJCAI, pp. 982-988, 2019.
21.
Yangtao Wang, Yanzhao Xie, Yu Liu, Ke Zhou and Xiaocui Li, "Fast graph convolution network based multi-label image recognition via cross-modal fusion", Proceedings of the 29th ACM International Conference on Information Knowledge Management, pp. 1575-1584, 2020.
22.
Xiaoze Jiang, Siyi Du, Zengchang Qin, Yajing Sun and Jing Yu, "Kbgn: Knowledge-bridge graph network for adaptive vision-text reasoning in visual dialogue", Proceedings of the 28th ACM International Conference on Multimedia, pp. 1265-1273, 2020.
23.
Aashish Kumar Misraa, Ajinkya Kale, Pranav Aggarwal and Ali Aminian, "Multi-modal retrieval using graph neural networks", CoRR, vol. abs/2010.01666, 2020.
24.
Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic and Heng Tao Shen, "Adversarial cross-modal retrieval", Proceedings of the 25th ACM international conference on Multimedia, pp. 154-162, 2017.
25.
Li He, Xing Xu, Huimin Lu, Yang Yang, Fumin Shen and Heng Tao Shen, "Unsupervised cross-modal retrieval through adversarial learning", 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1153-1158, 2017.
26.
Jian Zhang, Yuxin Peng and Mingkuan Yuan, "Unsupervised generative adversarial cross-modal hashing", Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) New Orleans Louisiana USA February 2-7 2018, pp. 539-546, 2018.
27.
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo and Yantao Zheng, "Nus-wide: a real-world web image database from national university of singapore", Proceedings of the ACM international conference on image and video retrieval, pp. 1-9, 2009.
28.
Cyrus Rashtchian, Peter Young, Micah Hodosh and Julia Hockenmaier, "Collecting image annotations using amazon’s mechanical turk", Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 139-147, 2010.
29.
Liangli Zhen, Peng Hu, Xu Wang and Dezhong Peng, "Deep supervised cross-modal retrieval", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10394-10403, 2019.
30.
Xin Huang, Yuxin Peng and Mingkuan Yuan, "MHTN: modal-adversarial hybrid transfer network for cross-modal retrieval", IEEE Trans. Cybern., vol. 50, no. 3, pp. 1047-1059, 2020.

Contact IEEE to Subscribe

References

References is not available for this document.