Loading [MathJax]/extensions/MathZoom.js
Speech emotion recognition using convolutional long short-term memory neural network and support vector machines | IEEE Conference Publication | IEEE Xplore

Speech emotion recognition using convolutional long short-term memory neural network and support vector machines


Abstract:

In this paper, we propose a speech emotion recognition technique using convolutional long short-term memory (LSTM) recurrent neural network (ConvLSTM-RNN) as a phoneme-ba...Show More

Abstract:

In this paper, we propose a speech emotion recognition technique using convolutional long short-term memory (LSTM) recurrent neural network (ConvLSTM-RNN) as a phoneme-based feature extractor from raw input speech signal. In the proposed technique, ConvLSTM-RNN outputs phoneme- based emotion probabilities to every frame of an input utterance. Then these probabilities are converted into statistical features of the input utterance and used for the input features of support vector machines (SVMs) or linear discriminant analysis (LDA) system to classify the utterance-level emotions. To assess the effectiveness of the proposed technique, we conducted experiments in the classification of four emotions (anger, happiness, sadness, and neutral) on IEMOCAP database. The result showed that the proposed technique with either of SVM or LDA classifier outperforms the conventional ConvLSTM-based one.
Date of Conference: 12-15 December 2017
Date Added to IEEE Xplore: 08 February 2018
ISBN Information:
Conference Location: Kuala Lumpur, Malaysia

I. Introduction

In human-machine interaction (HMI), we are still far from being able to fully communicate with machines because it is difficult for machines to interpret some paralinguistic information appearing in the spoken language such as emotions. Speech emotion recognition (SER), which aims to classify speaker's emotional states through speech signals, is one of the essential tasks for making HMI more natural and realistic. Although SER has been widely studied and attracting researchers' attention, the performance of SER systems developed so far remains relatively low, especially for spontaneous conversational speech. Consequently, improvement of SER performance is a crucial problem to be solved in HMI research area.

Contact IEEE to Subscribe

References

References is not available for this document.