I. Introduction
Speech Emotion Recognition (SER) was first proposed in 1997 by Picard [1] and has attracted widespread attention. It is well known that language communication is the preferred method when communicating with others in daily life, and human language is first formed through speech. It can be said that speech plays a decisive supporting role in language. Human speech not only contains important semantic information, but also implies rich emotional information [2]. The aim of SER is to obtain the emotional states of a user derived from their speech [3], thereby achieving harmonious communication between humans or between humans and machines, and in this article, we refer to a machine as a smart home assistant.