1. Introduction
Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object categories. The first challenge is coming up with models that can capture the ‘essence’ of a category, i.e. what is common to the objects that belong to it, and yet are flexible enough to accommodate object variability (e.g. presence/absence of distinctive parts such as mustache and glasses, variability in overall shape, changing appearance due to lighting conditions, viewpoint etc). The challenge of detection is defining metrics and inventing algorithms that are suitable for matching models to images efficiently in the presence of occlusion and clutter. Learning is the ultimate challenge. If we wish to be able to design visual systems that can recognize, say, 10,000 object categories, then effortless learning is a crucial step. This means that the training sets should be small and that the operator-assisted steps that are required (e.g. elimination of clutter in the background of the object, scale normalization of the training sample) should be reduced to a minimum or eliminated.