1. Introduction
Deep learning has achieved remarkable performance in various computer vision tasks such as classification [18], [13], object detection [27], [26], and semantic segmentation [19], [4] due to massive datasets with annotations such as ImageNet for image classification [7] and PASCAL VOC for classification, detection, segmentation [9]. Obtaining good annotations is challenging and has often been a large-scale project. Moreover, there are often cases where labeling massive amount of data is even more challenging or infeasible due to high labeling cost such as labeling by experts [8] or long labeling time per large-scale sample such as videos [1] or pathology images [3]. Labeling cost seems to be a factor to limit the scope of applicability of deep learning to more research areas and more institutes with less labeling budget.