I. Introduction
The tremendous amount of video data generated by camera networks installed at industries, offices, and public places meet the requirements of Big Data. For instance, a simple multiview network with two cameras, acquiring video from two different views with 25 frames per second (fps), generates 180 000 frames (90 000 for each camera) for an hour. The surveillance networks acquire video data for 24 hours from multiview cameras, thereby making it challenging to extract useful information from this Big Data. It requires significant effort when searching for salient information in such huge-sized 60 × 60 video data. Thus, automatic techniques are required to extract the prominent information present in videos without involving any human efforts. In the video analytics literature, there exist several key information extraction techniques such as video abstraction [1], video skimming [2], and video summarization (VS) [3]. VS techniques investigate input video for salient information and create a summary in the form of keyframes or short video clips that represent lengthy videos. The extracted keyframes assist in many applications such as action and activity recognition [4], [5], anomaly detection [6], and video retrieval [7].