I. Introduction
With the development of big data, many computer vision jobs require enormous training datasets to achieve the deep learning model training, which is a form of data hunger [1]. To meet the data requirements, model trainer usually picks up the training data from the Internet to achieve classification and other visual research [2], [3], [4], [5]. In a typical Internet picture-driven application, we can look for photos on web search engines like Google and Bing or on photo-sharing sites, download a large number of images that correlate to a text query, and then model the target object/concept linked with the image collection. Search engine or site crawler images, on the other hand, usually captures some noised samples, which can impact the learning model. As a result, the outliers, or irrelevant images, in the collected training data need to be removed.