Loading [MathJax]/extensions/MathMenu.js
Spatial-Temporal Graphs Plus Transformers for Geometry-Guided Facial Expression Recognition | IEEE Journals & Magazine | IEEE Xplore

Spatial-Temporal Graphs Plus Transformers for Geometry-Guided Facial Expression Recognition


Abstract:

Facial expression recognition (FER) is of great interest to the current studies of human-computer interaction. In this paper, we propose a novel geometry-guided facial ex...Show More

Abstract:

Facial expression recognition (FER) is of great interest to the current studies of human-computer interaction. In this paper, we propose a novel geometry-guided facial expression recognition framework, based on graph convolutional networks and transformers, to perform effective emotion recognition from videos. Specifically, we detect and utilize facial landmarks to construct a spatial-temporal graph, based on both the landmark coordinates and local appearance, for representing a facial expression sequence. The graph convolutional blocks and transformer modules are employed to produce high-semantic emotion-related representations from the structured facial graphs, which facilitate the framework to establish both the local and non-local dependency between the vertices. Moreover, spatial and temporal attention mechanisms are introduced into graph-based learning to promote FER reasoning, via the emphasis on the most informative facial components and frames. Extensive experiments demonstrate that the proposed framework achieves promising performance for geometry-based FER and shows great generalization and robustness in real-world applications.
Published in: IEEE Transactions on Affective Computing ( Volume: 14, Issue: 4, 01 Oct.-Dec. 2023)
Page(s): 2751 - 2767
Date of Publication: 10 June 2022

ISSN Information:


1 Introduction

Facial expressions are some of the most straightforward and natural ways for human beings to convey their intentions and internal states in daily life. Being able to automatically recognize facial expressions makes intelligent machines able to better understand human behaviors [1], [2]. With the rapid development of deep learning techniques in the past decade, great efforts have been made to explore discriminative representations, using deep neural networks, from facial images for expression recognition, and have achieved promising performance in real-world applications [3], [4], [5], [6]. Recently, with more video data being collected in multimedia communications, extensive attention has been directed to exploit facial dynamics for emotion predictions [7], [8], [9]. However, most of these video-based facial expression recognition (FER) algorithms are focusing on appearance-based feature learning, while only limited effort has been devoted to investigate the geometric knowledge behind facial sequences for FER. It has been demonstrated in the literature [10], [11], [12], [13], [14] that geometric information, i.e., the structure deformation and relative shift of the facial components (e.g., mouth, eyes, nose, etc.), is also sensitive to facial expressions. Therefore, it is of great value to explore a geometry-guided method to promote FER in practical tasks.

Contact IEEE to Subscribe

References

References is not available for this document.