I. Introduction
Convolutional Neural Networks (CNNs) obtained state-of-the-art results in many applications including face recognition [1]–[3]. A face recognition system based on CNN models, usually, begins with the creation of a large scale dataset from videos or still images [2], [4]. This is an extremely important step because the performance of the whole system depends on the availability of large quantity of data and on its quality [1]. Typically, large scale datasets created in a semi-supervised way from search engines are prone to noise [5]. Even though deep CNN withstand certain amount of noise, a significant presence can deteriorate performance of recognition systems. To tackle these challenges, we present a generic pipeline for face recognition systems based on learning embeddings using a deep CNN, similar to Facenet [1] where the use of the Triplet loss enhanced the discriminative power of the deeply learned face features. Our pipeline is capable of creating a dataset either from video or still images from scratch. In addition, it can be used to remove noise from existing datasets such as MS-Celeb-lM [5]. Our proposed pipeline is general and can be employed to various problems occurring, for example, when organizations want to measure the recurrent presence of a specific set of individuals, e.g. detecting students in attendance at a lecture, identifying members at a fitness club or monitoring people in a airport.