Conferences >2017 Asia-Pacific Signal and ...

Speech emotion recognition using convolutional long short-term memory neural network and support vector machines

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we propose a speech emotion recognition technique using convolutional long short-term memory (LSTM) recurrent neural network (ConvLSTM-RNN) as a phoneme-ba...Show More

Metadata

Abstract:

In this paper, we propose a speech emotion recognition technique using convolutional long short-term memory (LSTM) recurrent neural network (ConvLSTM-RNN) as a phoneme-based feature extractor from raw input speech signal. In the proposed technique, ConvLSTM-RNN outputs phoneme- based emotion probabilities to every frame of an input utterance. Then these probabilities are converted into statistical features of the input utterance and used for the input features of support vector machines (SVMs) or linear discriminant analysis (LDA) system to classify the utterance-level emotions. To assess the effectiveness of the proposed technique, we conducted experiments in the classification of four emotions (anger, happiness, sadness, and neutral) on IEMOCAP database. The result showed that the proposed technique with either of SVM or LDA classifier outperforms the conventional ConvLSTM-based one.

Published in: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Date of Conference: 12-15 December 2017

Date Added to IEEE Xplore: 08 February 2018

ISBN Information:

DOI: 10.1109/APSIPA.2017.8282315

Conference Location: Kuala Lumpur, Malaysia

Contents

I. Introduction

In human-machine interaction (HMI), we are still far from being able to fully communicate with machines because it is difficult for machines to interpret some paralinguistic information appearing in the spoken language such as emotions. Speech emotion recognition (SER), which aims to classify speaker's emotional states through speech signals, is one of the essential tasks for making HMI more natural and realistic. Although SER has been widely studied and attracting researchers' attention, the performance of SER systems developed so far remains relatively low, especially for spontaneous conversational speech. Consequently, improvement of SER performance is a crucial problem to be solved in HMI research area.

References is not available for this document.

MIT Libraries

MIT Libraries

Speech emotion recognition using convolutional long short-term memory neural network and support vector machines

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Speech emotion recognition using convolutional long short-term memory neural network and support vector machines

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?