I. Introduction
Today, machine speech recognition services such as Automatic Speech Recognition (ASR) have been widely used in society. The machine can easily recognize what humans are talking about. As shown in Fig. 1, similar to ASR, Speech Emotion Recognition (SER) uses machines to recognize humans' emotions when they are talking.