Conferences >2020 10th Annual Computing an...

Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Speech Emotion Recognition (SER) refers to the use of machines to recognize the emotions of a speaker from his (or her) speech. SER has broad application prospects in the...Show More

Metadata

Abstract:

Speech Emotion Recognition (SER) refers to the use of machines to recognize the emotions of a speaker from his (or her) speech. SER has broad application prospects in the fields of criminal investigation and medical care. However, the complexity of emotion makes it hard to be recognized and the current SER model still does not accurately recognize human emotions. In this paper, we propose a multi-head self-attention based attention method to improve the recognition accuracy of SER. We call this method head fusion. With this method, an attention layer can generate some attention map with multiple attention points instead of common attention maps with a single attention point. We implemented an attention-based convolutional neural networks (ACNN) model with this method and conducted experiments and evaluations on the Interactive Emotional Dyadic Motion Capture(IEMOCAP) corpus, obtained on improvised data 76.18% of weighted accuracy (WA) and 76.36% of unweighted accuracy (UA), which is increased by about 6% compared to the previous state-of-the-art SER model.

Published in: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC)

Date of Conference: 06-08 January 2020

Date Added to IEEE Xplore: 12 March 2020

ISBN Information:

DOI: 10.1109/CCWC47524.2020.9031207

Conference Location: Las Vegas, NV, USA

References is not available for this document.

Contents

I. Introduction

Today, machine speech recognition services such as Automatic Speech Recognition (ASR) have been widely used in society. The machine can easily recognize what humans are talking about. As shown in Fig. 1, similar to ASR, Speech Emotion Recognition (SER) uses machines to recognize humans' emotions when they are talking.

References is not available for this document.

Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?