1. Introduction
Image classification is an important challenge in computer vision, and has a variety of applications such as automating content-based retrieval, analyzing medical imagery, or recognizing locations in photos. Much progress over the last decade shows that supervised learning algorithms coupled with effective image descriptors can yield very good scene, object, and attribute predictions, e.g.,[3], [13], [14]. The standard training process entails gathering category-labeled image exemplars, essentially asking human annotators to say “what” is present (and possibly “where” in the image it is). In this respect, current approaches give a rather restricted channel of input to the human viewer, who undoubtedly has a much richer understanding than a simple label can convey.