I. Introduction
Traditional studies of emotion analysis primarily relied on verbal expressions, facial cues, and physiological responses as indicators of emotional states. However, humans convey emotions through multiple channels simultaneously, including speech acoustic, the content of speech, facial expressions, and body language. This multifaceted nature of emotional expression has led to the emergence of Speech Emotion Recognition (SER) and Multimodal Emotion Recognition as potential fields of research.