I. Introduction
Facial expression is one of the most powerful, and mental states [1]. As a research hotspot in the field of computer vision and human-computer interaction, facial expression recognition has attracted more and more attention in academia and industry in recent years, and its application fields are very wide, include medical monitoring, education evaluation, criminal investigation, emotional computing, driving fatigue monitoring, etc. The methods of facial feature extraction are mainly divided into two kinds:1) Methods based on artificial features 2) Methods based on the deep learning network model. The artificial features include scale-invariant feature transform (SIFT) [2], Histogram of oriented gradient (HOG) [3], Active appearance model (AAM) [4], Local binary patterns (LBP) [5], etc. These feature extraction methods need manual intervention, then may lose part of the original expression feature information and is difficult to extract high-order statistical features of the facial image. In the field of computer vision, deep convolutional neural network (CNN), such as Alexnet [6], VGG [7], and RestNet [8], have played an important role in feature extraction methods for facial expression recognition. In the actual scene of facial expression recognition, the image background is often complex, and there are many useless feature information, so the feature extracted by traditional convolution neural network is relatively single, this will resulte in low recognition accuracy. In this paper, facial expression recognition method is proposed, which uses a lightweight multi-scale convolutional neural network with attention mechanism for facial feature extraction. The model is trained, tested and verified on facial expression database Fer 2013 and KDEF. Experimental results show that our method enhances the representation ability of the convolutional neural network, and improves the accuracy of facial expression recognition.