1. Introduction
Building a new dataset usually involves manually labeling every sample for the particular task at hand. This process is cumbersome and limits the creation of large datasets, which are usually necessary for training deep neural networks (DNNs) in order to achieve the required performance. Conversely, automatic data annotation based on web search and user tags [29], [22] leverages the use of larger data collections at the expense of introducing some incorrect labels. This label noise degrades DNN performance [3], [52] and this poses an interesting challenge that has recently gained a lot of interest in the research community [45], [41], [23], [50], [12], [1], [28], [55], [13], [31].