I. Introduction
In this digital generation, most people have access to smart-phones, digital camera, and many other devices, which gener-ates a huge amount of multimedia data. Today where everyone is keen to click selfies around the globe, images have become an essential fraction of multimedia data. This generated data is stored either locally or on any third-party remote server. But, in both cases, one will might use it in near future. So, retrieving similar images to a query image becomes a big question. In the initial phase of image retrieval, it is carried out using a descriptive way, i.e., each image has been associated with a description according to the owner of the images or by the CBIR service provider. This practice has been obsoleted due to some reasons. First of all, describing the currently available volume of images is next to impossible. The time taken in this process could be unexpected. Secondly, each human has his/her thinking process. Like the picture given in Figure 1, one may be described as a mountain, snowy, Pictionary, and many more. This process is called text-based image retrieval (TBIR) [1]. So, to overcome this, researchers and academicians start focusing on retrieval systems based on the features of the image. This technique is formalised as content-based image retrieval (CBIR) [2]. Researchers have been using CBIR for more than two decades. In CBIR, primitive visual image features are extracted from all the images of the designated image database. During the initial phase, only one image features among color, texture, and shape [3] is considered. But, later, people started to incorporate various features in different combinations and combined them as a feature vector for an image. The same set of features is extracted from all the images used in the retrieval scheme. In CBIR, there are two kind of image features and they are known as global and local primitive image features. The selected visual feature is extracted from the entire image for global feature extraction. Though this type of extraction is fast, the resulted features do not closely associate with the image's actual essence. So, local image features became necessary. The local primitive visual features extraction process starts with the partitioning the input image into numbers of non-overlapping parts. Now, from these portions, the selected feature is extracted. In the last, all the extracted features are combined together to form the final feature vector. Since local image feature extraction takes a bit of time, there is always a trade-off between the time required and the system's efficiency. Therefore, the selection of the features becomes very tricky. The CBIR process can be divided into two phases. The first is the feature extraction, and the second is the similarity measurement. The user selects an appropriate similarity measure to retrieve similar images based on a query image in similarity measurement.