1. Introduction
With the rapid advancement in imaging and computing technologies, there is unprecedented growth in the amount of data that we generate every day. To cope with this explosion, proper organization is essential for on-demand retrieval of multi-media information. Visual data is unstructured and bulkier than text. As the database grows, the difficulty of efficient storage and retrieval of relevant search results increases. Traditional search engines index visual data based on the manual annotation of the surrounding metadata such as titles and meta-tags. There are research prototypes using deep learning techniques for automatically extracting details like actions, objects, and captions from videos. The similarity in this metadata can be further used to scrape relevant videos during a search. A downside to this approach is that textual descriptors are often inadequate to describe videos, simply because the same video can be described in different ways. Moreover, retrieval based on such meta-data as query yields too many results, making the search inefficient. With frequent adding and updating of multimedia in massive databases, it is also highly impractical to perform manual entry of all the attributes. Further, manual tagging of industrial videos is in its infancy. Content-based retrieval by using a video example as a query addresses many of these issues. It provides more flexibility and has the ability to query attributes such as texture or shape that are difficult to represent using keywords.