1 Introduction
Facial expressions are some of the most straightforward and natural ways for human beings to convey their intentions and internal states in daily life. Being able to automatically recognize facial expressions makes intelligent machines able to better understand human behaviors [1], [2]. With the rapid development of deep learning techniques in the past decade, great efforts have been made to explore discriminative representations, using deep neural networks, from facial images for expression recognition, and have achieved promising performance in real-world applications [3], [4], [5], [6]. Recently, with more video data being collected in multimedia communications, extensive attention has been directed to exploit facial dynamics for emotion predictions [7], [8], [9]. However, most of these video-based facial expression recognition (FER) algorithms are focusing on appearance-based feature learning, while only limited effort has been devoted to investigate the geometric knowledge behind facial sequences for FER. It has been demonstrated in the literature [10], [11], [12], [13], [14] that geometric information, i.e., the structure deformation and relative shift of the facial components (e.g., mouth, eyes, nose, etc.), is also sensitive to facial expressions. Therefore, it is of great value to explore a geometry-guided method to promote FER in practical tasks.