Journals & Magazines >IEEE Transactions on Affectiv... >Volume: 16 Issue: 1

Multi-Level Contrastive Learning: Hierarchical Alleviation of Heterogeneity in Multimodal Sentiment Analysis

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recently, multimodal fusion efforts have achieved remarkable success in Multimodal Sentiment Analysis (MSA). However, most of the existing methods are based on model-leve...Show More

Metadata

Abstract:

Recently, multimodal fusion efforts have achieved remarkable success in Multimodal Sentiment Analysis (MSA). However, most of the existing methods are based on model-level fusion, and the challenge of heterogeneity between modalities is not well resolved. Heterogeneity lies in the different feature distributions and distinct representation spaces among different modalities. To mitigate this problem, we propose that fusion is a progressive process, and we introduce a novel multi-level contrastive learning and multi-layer convolution fusion (MCL-MCF) method for MSA. Due to the relationships among multimodal data, the fusion process that involves single-modal to single-modal, single-modal to bimodal or trimodal, and higher-level fused modality semantic consistency is divided into three levels. The first-level contrast learning alleviates heterogeneity between unimodal modalities at the early level of multimodal feature fusion. The second-level contrast learning mitigates heterogeneity between unimodal and fused modalities. At the third level, we introduce a tensor convolution fusion (TCF) module that extracts high-level semantic features from the fused modalities and mitigates heterogeneity at the higher feature level through contrastive learning. To simulate fusion as a progressive process, MCF is proposed to fuse shallow and deep features to model complex relationships among modalities. Experiments on three public datasets show our approach's state-of-the-art performance.

Published in: IEEE Transactions on Affective Computing ( Volume: 16, Issue: 1, Jan.-March 2025)

Page(s): 207 - 222

Date of Publication: 05 July 2024

ISSN Information:

DOI: 10.1109/TAFFC.2024.3423671

Funding Agency:

Contents

I. Introduction

Multimodal sentiment analysis (MSA) aims to predict emotional scores from audio, visual, and text features. MSA has been widely used and has become a popular topic of research. It has been widely applied in areas such as marketing management [1], [2], social media analysis [3], [4], and human-computer interaction [5], [6]. Although it is easy for humans to perceive the world through comprehensive information acquired via multiple sensory organs [7], the question of how to endow machines with analogous cognitive capabilities is still unresolved. One of the challenges we are facing is the heterogeneity gap in multimodal data [8]. This gap arises from the initial unequal subspaces of feature vectors extracted from different modalities, leading to completely different vector representations for semantically similar elements. This phenomenon poses a challenge to the comprehensive utilization of multimodal data by subsequent machine learning modules [9]. Researchers have made remarkable strides in the realm of designing multimodal feature fusion methods [10], [11], [12], [13], [14]. Nevertheless, limited consideration has been given to addressing the disparities in heterogeneity among multimodal features. Currently, there are two main methods in MSA: the first involves geometric operations performed on feature vectors to achieve feature fusion, while the second involves the use of a transformer (attention) to design complex feature fusion methods.

References is not available for this document.

Multi-Level Contrastive Learning: Hierarchical Alleviation of Heterogeneity in Multimodal Sentiment Analysis

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Multi-Level Contrastive Learning: Hierarchical Alleviation of Heterogeneity in Multimodal Sentiment Analysis

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References