I. Introduction
Neural networks based deep learning approaches have achieved many inspiring results in machine learning and pattern recognition [1], especially in face recognition [2]–[6]. Google's Facenet [3] has achieved 99.63% on the Labeled Faces in the Wild(LFW) benchmark [7] with the novel Inception architecture and the intriguing Triplet loss, which is almost reaching near-perfect performances. Though impressive as CNN is, training a robust and reliable neural network needs a large amount of data. Therefore, harvesting and labeling large dataset has become an effective way to boost the performance of CNN. For instance, Deep-Face proposed by Facebook [2], VGG-Face [4] and Face++'s Megvii system [5] is trained by using 4.4 million faces, 2.6 million faces, and 5 million faces respectively. And we know that the more complicated the neural network is, the more data it needs in order to prevent overfitting. The state-of-the-art FaceNet [3] utilized 200 million faces with eight million unique identities.