Conferences >2003 IEEE International Confe...

Hidden Markov model-based speech emotion recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We introduce speech emotion recognition by use of continuous hidden Markov models. Two methods are propagated and compared. In the first method, a global statistics frame...Show More

Metadata

Abstract:

We introduce speech emotion recognition by use of continuous hidden Markov models. Two methods are propagated and compared. In the first method, a global statistics framework of an utterance is classified by Gaussian mixture models using derived features of the raw pitch and energy contour of the speech signal. A second method introduces increased temporal complexity, applying continuous hidden Markov models considering several states using low-level instantaneous features instead of global statistics. The paper addresses the design of working recognition engines, and results are achieved with respect to the alluded alternatives. A speech corpus consisting of acted and spontaneous emotion samples in German and English is described in detail. Both engines have been tested and trained using this equivalent speech corpus. Results in recognition of seven discrete emotions exceeded 86% recognition rate. In comparison, the judgment of human deciders classifying the same corpus at 79.8% recognition rate was analyzed.

Published in: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

Date of Conference: 06-10 April 2003

Date Added to IEEE Xplore: 05 June 2003

Print ISBN:0-7803-7663-3

Print ISSN: 1520-6149

DOI: 10.1109/ICASSP.2003.1202279

Conference Location: Hong Kong, China

Contents

1. INTRODUCTION

Speech emotion recognition is one of the latest challenges in speech processing. Besides human facial expressions speech has proven as one of the most promising modalities for the automatic recognition of human emotions. Especially in the field of security systems a growing interest can be observed throughout the last year. Besides, the detection of lies, video games and psychiatric aid are often claimed as further scenarios for emotion recognition [1]. Addressing classification in a practical view it has to be considered that a technical approach can only rely on pragmatic decisions about kind, extent and number of emotions suiting the situation. It seems reasonable to adapt and limit this number and kind of recognizable emotions to the requirements given within the application to ensure a robust classification. Yet no standard exists for the classification of emotions in technical recognition. An often favored way is to distinguish between a defined set of discrete emotions. However, as mentioned, no common opinion exists about their number and naming. A recent approach can be found in the MPEG4 standard, which names the six emotions anger, disgust, fear, joy, sadness and surprise. The addition of a neutral state seems reasonable to realize the absence of any of these emotions. This classification is used as a basis for the comparison throughout this work also expecting further comparisons. Most approaches in nowadays speech emotion recognition use global statistics of a phrase as basis [2]. However also first efforts in recognition of instantaneous features exist [3] [4]. We present two working engines using both alluded alternatives by use of continuous hidden Markov models, which have evolved as a far spread standard technique in speech processing.

References is not available for this document.

Hidden Markov model-based speech emotion recognition

Abstract:

Metadata

Abstract:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Hidden Markov model-based speech emotion recognition

Alerts

Abstract:

Metadata

Abstract:

1. INTRODUCTION

Authors

Figures

References

Citations

Keywords

Metrics

References