Conferences >2018 25th IEEE International ...

Celeb-500K: A Large Training Dataset for Face Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we propose a large training dataset named Celeb-500K for face recognition, which contains 50M images from 500K persons. To better facilitate academic resea...Show More

Metadata

Abstract:

In this paper, we propose a large training dataset named Celeb-500K for face recognition, which contains 50M images from 500K persons. To better facilitate academic research, we clean Celeb-500K to obtain Celeb-500K-2R, which contains 25M aligned face images from 365K persons. Based on the developed dataset, we achieve state-of-the-art face recognition performance and reveal two important observations on face recognition study. First, metric learning methods have limited performance gain when the training dataset contains a large number of identities. Second, in order to develop an efficient training dataset, the number of identities is more important than the average image number of each identity from the perspective of face recognition performance. Extensive experimental results show the superiority of Celeb-500K and provide a strong support to the two observations.

Published in: 2018 25th IEEE International Conference on Image Processing (ICIP)

Date of Conference: 07-10 October 2018

Date Added to IEEE Xplore: 06 September 2018

ISBN Information:

Electronic ISSN: 2381-8549

DOI: 10.1109/ICIP.2018.8451704

Conference Location: Athens, Greece

Contents

1. Introduction

In this paper, we propose a large training dataset named Celeb-500K for deep learning [1]–[6] based large scale face recognition [7]–[11]. The training dataset consists of 50M images from 500K persons. Our paper focuses on addressing the following two issues in face recognition. First, according to Table 1, there are large gaps on dataset scale between publicly available datasets and private datasets. For example, CelebFace [12] has only 1/800 identities and 1/500 images of the Google dataset. Therefore, compared with industrial applications, the academic research community can only resort to smaller scaled datasets resulting in typically biased conclusions. Thus the efficacy of the proposed methods on larger training datasets needs further verification. For example, many metric learning methods including Contrastive Loss [12], Center Loss [13] and Triplet Loss [14] have greatly improved the face recognition performance of models trained on smaller public datasets such as CelebFace and CASIA-WebFace, but their efficiency on larger scale datasets needs further investigation. Table 1. Recent face recognition training datasets.

Dataset	Available	#People	#Images
YFD [15]	public	1595	3425 videos
VGGFace [16]	public	2600	2.6M
VGGFace2 [10]	public	9131	3.3M
CelebFaces [12]	public	10K	202K
CASIA-WebFace [17]	public	10K	500K
MS-Celeb-1M [18]	public	100K	10M
Celeb-500K	public	500K	50M
Facebook [18]	private	4K	4.4M
Google [18]	private	8M	100–200M

References is not available for this document.

Celeb-500K: A Large Training Dataset for Face Recognition

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Celeb-500K: A Large Training Dataset for Face Recognition

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?