1. Introduction
Data, the basic unit of machine learning, has tremendous impact on the success of learning-based applications. Much of the recent A.I. revolution can be attributed to the creation of the ImageNet dataset [12], which showed that image classification with deep learning at scale [25] can result in learning strong feature extractors that transfer to domains and tasks beyond the original dataset. Using citations as a proxy, ImageNet has supported at least 40,000 research projects to date. It has been unmatched as a pre-training dataset to downstream tasks, due to its size, diversity and the quality of labels. Since its conception, interest in creating large datasets serving diverse tasks and domains has skyrocketed. Examples include object detection [47], action-recognition- [10], and 3D reconstruction [32], [6], in domains such as self-driving [15], [3], and medical imaging [44].