Abstract:
Speech emotion recognition plays a vital role in enhancing human-computer interaction and improving user experience in various applications. This paper investigates the u...View moreMetadata
Abstract:
Speech emotion recognition plays a vital role in enhancing human-computer interaction and improving user experience in various applications. This paper investigates the utilization of spatio-temporal patterns in speech emotion recognition, contrasting them with conventional methods that rely solely on spatial or temporal information. The approach involves a parallel architecture, coupling Convolutional Neural Networks (CNNs) with Transformers as an encoder block network. This design combines the spatial feature extraction capabilities of CNNs with the temporal modeling strengths of Transformers, enabling the capture of intricate patterns and contextual relationships within speech data. We present a comprehensive experimental analysis conducted on three benchmark datasets, shedding light on the impact of the utilization of spatio-temporal patterns in advancing the field of speech emotion recognition.
Published in: 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA)
Date of Conference: 14-15 November 2023
Date Added to IEEE Xplore: 13 February 2024
ISBN Information: