A Two-Stage Hierarchical Multilingual Emotion Recognition System Using Hidden Markov Models and Neural Networks | IEEE Conference Publication | IEEE Xplore

A Two-Stage Hierarchical Multilingual Emotion Recognition System Using Hidden Markov Models and Neural Networks


Abstract:

Speech emotion recognition continues to attract a lot of research especially under mixed language speech. Here, we show that emotion is culture/language dependent. In thi...Show More

Abstract:

Speech emotion recognition continues to attract a lot of research especially under mixed language speech. Here, we show that emotion is culture/language dependent. In this paper, we propose a two-stage emotion recognition system that starts by identifying the language then using a dedicated language-dependent recognition system for identifying the type of emotion, The system is able to recognize accurately the four main types of emotion, namely Neutral, happy, angry, and sad. These types of emotion states are widely used in practical setups. To keep the computation complexity low, we identify the language using a feature vector consisting of energies from a basic wavelet decomposition of the speech signal. The Hidden Markov Model is then used to track the changes of this energy feature vector to identify the language achieving recognition of accuracy close to 100%. Once the language is identified, a set of traditional speech processing features including pitch, formats, MFCCs.... etc, are used with a basic Neural Network architecture to identify the type of emotion. The results show that that identifying the language first can substantially improve the overall accuracy in identifying emotions. The overall accuracy achieved with the proposed hierarchical system was above 93 %. The work shows the strong correlation between language/culture and type of emotion, and can further be extended to other scenarios such as gender-based recognition, facial-expression based recognition, age-based recognition ... etc.
Date of Conference: 08-11 May 2017
Date Added to IEEE Xplore: 30 August 2018
ISBN Information:
Electronic ISSN: 2473-9391
Conference Location: Manama, Bahrain

I. Introduction

Research in developing human-computer interaction tools has attracted a lot of attention among scientists and engineers in recent years. One important aspect of human-computer interaction is to make computers able to understand human's emotion through voice (or facial expressions), so that different actions can follow. Substantial advances have been achieved for different voice recognition applications to date. People can now use their voice to provide commands to cars, cellphones, computers, TVs, and many others devices. However, understanding emotions from voice is still a challenge. Success in this area is expected to provide a better quality of experience for the user. Several research efforts have been put in developing emotion based systems. In [1], for example, the authors investigated an application of speech emotion recognition for avoiding traffic accidents. The work performed in [2], on the other hand, proposed a machine learning algorithm linked to voice messaging systems to give priority action depending on the relationship between energy, speaking rate, and pitch parameters as a representation of the emotion status. As an example, when energy is high and speech rate is low, then priority is classified as urgent, and so on.

Contact IEEE to Subscribe

References

References is not available for this document.