1. Introduction
With modern supervised learning methods, machines can recognize thousands of visual categories with high reliability; in fact, machines can outperform individual humans when performance depends on extensive domain-specific knowledge as required for example to recognize hundreds of species of dogs in ImageNet [11]. However, it is also clear that machines are still far behind human intelligence in some fundamental ways. A prime example is the fact that good recognition performance can only be obtained if computer vision algorithms are manually supervised. Modern machine learning methods have little to offer in an open world setting, in which image categories are not defined a-priori, or for which no labelled data is available. In other words, machines lack an ability to structure data automatically, understanding concepts such as object categories without external supervision.