1. Introduction
The task of classifying/recognizing, detecting, and clustering general objects in natural scenes is extremely challenging. The difficulty is due to many reasons: large intraclass variation and inter-class similarity, articulation and motion, different lighting conditions, orientations/viewing directions, and the complex configurations of different objects. The first row of Fig. (1) displays some face images. The second row shows some typical images from the Caltech-101 categories of objects [5]. Some of them are highly non-rigid and some of the objects in the same category bear little similarity among each other. For the categorization task, it requires very high level knowledge to put different instances of a class into the same category.