Conferences >2024 International Conference...

MERSA: Multimodal Emotion Recognition with Self-Align Embedding

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Emotions are an integral part of human communication and interaction, significantly shaping our social connections, decision-making, and overall well-being. Understanding...Show More

Metadata

Abstract:

Emotions are an integral part of human communication and interaction, significantly shaping our social connections, decision-making, and overall well-being. Understanding and analyzing emotions have become essential in various fields, including psychology, human-computer interaction, marketing, and healthcare. The previous approach has indeed made significant strides in improving the accuracy of predicting emotions within speech. However, the current model’s performance still falls short when it comes to real-life applications. This limitation arises due to several factors such as lack of context, ambiguity in speech and meaning, and other contributing elements. To reduce the ambiguity of emotions within speech, this paper seeks to leverage multiple data modalities, specifically textual and acoustic information. To analyze these modalities, we propose a novel approach called MERSA which utilizes the self-align method to extract context features from both textual and acoustic information. By leveraging this technique, the MERSA model can effectively create fusion feature vectors of the multiple inputs, facilitating a more accurate and holistic analysis of emotions within speech. Moreover, the MERSA model has incorporated a cross-attention module into its network architecture, which enables the MERSA model to capture and leverage the interdependencies between the textual and acoustic modalities.

Published in: 2024 International Conference on Information Networking (ICOIN)

Date of Conference: 17-19 January 2024

Date Added to IEEE Xplore: 03 July 2024

ISBN Information:

Print on Demand(PoD) ISSN: 1976-7684

DOI: 10.1109/ICOIN59985.2024.10572116

Conference Location: Ho Chi Minh City, Vietnam

Contents

I. Introduction

Traditional studies of emotion analysis primarily relied on verbal expressions, facial cues, and physiological responses as indicators of emotional states. However, humans convey emotions through multiple channels simultaneously, including speech acoustic, the content of speech, facial expressions, and body language. This multifaceted nature of emotional expression has led to the emergence of Speech Emotion Recognition (SER) and Multimodal Emotion Recognition as potential fields of research.

References is not available for this document.

MERSA: Multimodal Emotion Recognition with Self-Align Embedding

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MERSA: Multimodal Emotion Recognition with Self-Align Embedding

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References