I. Introduction
Facial expressions are one of the most significant manifestations of human emotions. Facial expression recognition is gaining increased attention, as it can provide an additional input to human-computer interfaces. Approaches can be categorized as geometry-based, Action Unit (AU)-based or appearance-based. Most previously proposed methods rely upon extracting hand-crafted features. Geometry-based approaches extract features by tracking facial landmark points and modeling the geometrical relationship between them [1]. AU-based methods are grounded in the Facial Action Coding System (FACS) proposed by Paul Ekman et al. [2], which is widely utilized to encode facial expressions according to the localized muscle movements of facial regions, called Action Units. These approaches train individual AU detectors first and then analyze the combinations of AUs to classify facial expression based on the FACS [3]–[5]. Appearance-based methods usually use texture features extracted from local image patches, such as local binary pattern (LBP) [6], Gabor [7], [8], Haar [9] or scale invariant feature transform (SIFT) [10] features. In all of the above approaches, extracted features are passed to a classifier, e.g. a support vector machine, neural network, or Bayesian classifier, for the final recognition. They have achieved good accuracy on a number of databases, such as the extended Cohn-Kanade (CK+) database [11], the Japanese Female Facial Expression (JAFFE) database [12], the MMI database [13], etc.