1. Introduction
In the earlier years of content-based image retrieval (CBIR), low-level features, such as color and texture [1], [2], [3], [4], were used either in narrow domain to identify two sets of images in one of two categories, or in a broad domain, where images were classified into different categories using clustering techniques. Low-level features have been successful for specific applications such as face, fingerprint, and object recognition. But researchers have identified the limitations of using such features in querying and browsing images in huge image collections. Many techniques have also been developed to apply information retrieval techniques in the context of image retrieval. These techniques have been applied solely on the textual annotations and also in conjunction with low-level features of the image objects. The biggest problem in this approach is to have a good quality of annotations, which is time consuming and highly subjective. To extend the annotation list, WordNet [5] has been used in several systems, which further improves the results. The annotations have also been used to group images into a given number of concept bins, but the mapping from image space to concept space is not one-to-one, since it is possible to describe an image using a large number of concepts. No generic, direct transformation exists to map low-level representation into high-level concepts. The gap between low-level features and text annotations has been identified as the “semantic gap”.