Loading [MathJax]/extensions/MathZoom.js
Celeb-500K: A Large Training Dataset for Face Recognition | IEEE Conference Publication | IEEE Xplore

Celeb-500K: A Large Training Dataset for Face Recognition


Abstract:

In this paper, we propose a large training dataset named Celeb-500K for face recognition, which contains 50M images from 500K persons. To better facilitate academic resea...Show More

Abstract:

In this paper, we propose a large training dataset named Celeb-500K for face recognition, which contains 50M images from 500K persons. To better facilitate academic research, we clean Celeb-500K to obtain Celeb-500K-2R, which contains 25M aligned face images from 365K persons. Based on the developed dataset, we achieve state-of-the-art face recognition performance and reveal two important observations on face recognition study. First, metric learning methods have limited performance gain when the training dataset contains a large number of identities. Second, in order to develop an efficient training dataset, the number of identities is more important than the average image number of each identity from the perspective of face recognition performance. Extensive experimental results show the superiority of Celeb-500K and provide a strong support to the two observations.
Date of Conference: 07-10 October 2018
Date Added to IEEE Xplore: 06 September 2018
ISBN Information:
Electronic ISSN: 2381-8549
Conference Location: Athens, Greece
References is not available for this document.

1. Introduction

In this paper, we propose a large training dataset named Celeb-500K for deep learning [1]–[6] based large scale face recognition [7]–[11]. The training dataset consists of 50M images from 500K persons. Our paper focuses on addressing the following two issues in face recognition. First, according to Table 1, there are large gaps on dataset scale between publicly available datasets and private datasets. For example, CelebFace [12] has only 1/800 identities and 1/500 images of the Google dataset. Therefore, compared with industrial applications, the academic research community can only resort to smaller scaled datasets resulting in typically biased conclusions. Thus the efficacy of the proposed methods on larger training datasets needs further verification. For example, many metric learning methods including Contrastive Loss [12], Center Loss [13] and Triplet Loss [14] have greatly improved the face recognition performance of models trained on smaller public datasets such as CelebFace and CASIA-WebFace, but their efficiency on larger scale datasets needs further investigation. Recent face recognition training datasets.

Dataset Available #People #Images
YFD [15] public 1595 3425 videos
VGGFace [16] public 2600 2.6M
VGGFace2 [10] public 9131 3.3M
CelebFaces [12] public 10K 202K
CASIA-WebFace [17] public 10K 500K
MS-Celeb-1M [18] public 100K 10M
Celeb-500K public 500K 50M
Facebook [18] private 4K 4.4M
Google [18] private 8M 100–200M

Select All
1.
Alex Krizhevsky, Ilya Sutskever and Geoffrey E Hinton, "Imagenet classification with deep convolutional neural networks", Advances in neural information processing systems, pp. 1097-1105, 2012.
2.
Karen Simonyan and Andrew Zisserman, Very deep convolutional networks for large-scale image recognition, 2014, [online] Available: .
3.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, et al., "Going deeper with convolutions", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
4.
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep residual learning for image recognition", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
5.
Ross Girshick, "Fast r-cnn", Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
6.
Shaoqing Ren, Kaiming He, Ross Girshick and Jian Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks", Advances in neural information processing systems, pp. 91-99, 2015.
7.
Yi Sun, Xiaogang Wang and Xiaoou Tang, "Deep learning face representation from predicting 10000 classes", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891-1898, 2014.
8.
Yi Sun, Ding Liang, Xiaogang Wang and Xiaoou Tang, Deepid3: Face recognition with very deep neural networks, 2015, [online] Available: .
9.
Weihua Chen, Xiaotang Chen, Jianguo Zhang and Kaiqi Huang, Beyond triplet loss: a deep quadruplet network for person re-identification, CoRR, vol. abs/1704.01719, 2017.
10.
Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi and Andrew Zisserman, Vggface2: A dataset for recognising faces across pose and age, 2017, [online] Available: .
11.
Emily M Hand and Rama Chellappa, Attributes for improved attributes: A multi-task network for attribute classification, 2016, [online] Available: .
12.
Yi Sun, Xiaogang Wang and Xiaoou Tang, "Deep learning face representation by joint identification-verification", Advances in neural information processing systems, pp. 1988-1996, 2014.
13.
Yandong Wen, Kaipeng Zhang, Zhifeng Li and Yu Qiao, "A discriminative feature learning approach for deep face recognition", European Conference on Computer Vision, pp. 499-515, 2016.
14.
Florian Schroff, Dmitry Kalenichenko and James Philbin, Facenet: A unified embedding for face recognition and clustering, CoRR, vol. abs/1503.03832, 2015.
15.
Lior Wolf, Tal Hassner and Itay Maoz, "Face recognition in unconstrained videos with matched background similarity", Computer Vision and Pattern Recognition (CVPR) 2011 IEEE Conference on, pp. 529-534, 2011.
16.
Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman et al., "Deep face recognition", BMVC, vol. 1, pp. 6, 2015.
17.
Dong Yi, Zhen Lei, Shengcai Liao and Stan Z. Li, Learning face representation from scratch, 2014, [online] Available: .
18.
Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He and Jianfeng Gao, Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, August 2016.
19.
K. Zhang, Z. Zhang, Z. Li and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks", IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, Oct 2016.
20.
Xiang Wu, Ran He, Zhenan Sun and Tieniu Tan, A light cnn for deep face representation with noisy labels, 2015, [online] Available: .
21.
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, et al., "Caffe: Convolutional architecture for fast feature embedding", Proceedings of the 22nd ACM international conference on Multimedia, pp. 675-678, 2014.
22.
Gary B Huang, Manu Ramesh, Tamara Berg and Erick Learned-Miller, Labeled faces in the wild: A database for studying face recognition in unconstrained environments, 2007.

Contact IEEE to Subscribe

References

References is not available for this document.