Loading [MathJax]/extensions/MathZoom.js
Enhancing Speech Emotion Recognition Through Advanced Feature Extraction and Deep Learning: A Study Using the RAVDESS Dataset | IEEE Conference Publication | IEEE Xplore

Enhancing Speech Emotion Recognition Through Advanced Feature Extraction and Deep Learning: A Study Using the RAVDESS Dataset


Abstract:

This study focuses on the significance of speech emotion recognition (SER) in facilitating the ability of machines to recognize and comprehend human emotions expressed th...Show More

Abstract:

This study focuses on the significance of speech emotion recognition (SER) in facilitating the ability of machines to recognize and comprehend human emotions expressed through speech. This study introduces an innovative method for SER that incorporates sophisticated feature extraction techniques and deep learning approaches. The main objective is to improve the accuracy of identification, reduce processing requirements, and enable real-time application. Our suggested methodology utilizes the RAVDESS dataset and incorporates Mel-frequency cepstral coefficient (MFCC) characteristics that are processed by a convolutional neural network (CNN) model. Our methodology has higher performance in comparison to baseline methods, as evidenced by extensive experimentation and analysis. The CNN model demonstrates successful generation of feature maps for the time series data, resulting in improved extraction and understanding of MFCC features. The effectiveness and robustness of our suggested approach are demonstrated by evaluation measures such as recognition accuracy, computing time, and real-time applicability. The superiority of our technique is further validated through comparative assessments with existing approaches, which confirm its genericity, accuracy, and dependability in various SER scenarios. The study's ROC diagrams offer visual substantiation of the model's efficacy in accurately classifying emotions across diverse emotion categories. This study not only makes a valuable contribution to the progress of SER technology but also establishes the foundation for future investigations in this domain. The findings obtained from our research provide a foundation for the advancement of more advanced and efficient SER systems, which have wide-ranging ramifications in several fields such as human-computer interaction, sentiment analysis, mental health diagnostics, and other related areas.
Date of Conference: 28-29 June 2024
Date Added to IEEE Xplore: 22 August 2024
ISBN Information:
Conference Location: Bangalore, India

I. Introduction

A subdiscipline of affective computing, speech emotion recognition (SER) is concerned with the research and development of systems that can identify and interpret human emotions expressed through speech and other modalities [1]. The objective of SER is to autonomously identify and categorize emotions expressed via verbal communication, thereby facilitating a wide array of uses, including clinical diagnostics, psychological research, and human-computer interaction [2]. In addition to the transmission of information, human communication encompasses the manifestation of emotions, which are vital for deciphering the speaker's intentions, disposition, and emotional condition. The range of emotions expressed verbally is extensive, encompassing fear, surprise, happiness, sorrow, anger, and more intricate emotional states [3]. Accurately discerning these emotions from speech signals is a formidable task owing to the considerable diversity in linguistic content, speaker attributes, cultural impacts, and environmental variables.

Contact IEEE to Subscribe

References

References is not available for this document.