I. Introduction
With the rapid development of various technologies, artificial intelligence (AI) has become the focus of academic and research studies [1]. More and more AI products appear in human life [2], and people increasingly expect robots to exhibit emotional ability. However, current machines cannot communicate emotionally with humans in an intuitive way [3]. Among various ways of emotional communication, facial emotions and gestures [4] can convey 70% of information. In human–robot interaction, facial and body gesture emotion recognition is of great significance. Interpersonal human–human interaction is a dynamical exchange and coordination of social signals, feelings, and emotions usually performed through and across multiple modalities such as facial expressions, gestures, and language [5], and then infer the current emotional state. Therefore, facial emotions and body gestures can provide richer inner emotional states and a more accurate understanding of human emotions.