I. Introduction
Recently with the development of the Internet technology and the popularization of the consumer electronic devices, such as mobile phones and digital cameras, enormous digital images come forth everyday. However, these images have very little metadata by which they can be indexed and searched. It is a challenging task to search images rapidly and efficiently in all kinds of multimedia image databases. Current commercial image search engines, such as Google[1] and Baidu[2], search images not by their contents but by embedded text tags in Web pages, which leads to the poor retrieval accuracy. Content-based image retrieval(CBIR) is an important technique to solve the problem of retrieving useful information within enormous amount of digital images [3]–[6]. The classical paradigm for CBIR is query by example(QBE), which performs image retrieval with matching and ranking images by similarity to a user-provided query image. The system extracts a signature from the query image, compares this signature to those previously computed for the images in the database, and returns the closest matches [3]–[6]. Much of the previous work on image retrieval has used global features such as color[7] and texture[8] to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. For example, when looking at a picture, human beings are attracted by some salient regions which can help them to recognize a semantic object in the picture such as a car or a rose. The salient region of the image, which is the most informative part of the image, is composed of salient or interest points, which can represent the image's local properties. Detecting salient region and extracting salient points can be called image saliency analysis. Visual features based on these salient regions or salient points, which may be called salient features, can be extracted for image retrieval.