I. Introduction
Human activities and events analysis in visual surveillance are significantly beneficial and have recently evolved in the industrial sector, where hundreds of workers on jobs need automatic monitoring. Human activity recognition is a demand of numerous industrial applications, such as video summarization, smart surveillance and monitoring systems, virtual reality, robotics, medical diagnostic, elderly healthcare, and content-based video retrieval [1], [2]. Surveillance cameras are running in a 24/7 fashion. However, in different environments shots of interest occur rarely, like activity in front of an ATM machine, which does not happen continuously. Therefore, for such scenarios, an efficient shot segmentation is required for effective activity analysis. Shot segmentation is an important step of video processing, which helps us to analyze only vital contents in the stream for which several attempts have been made [3]. For instance, Mehrnaz et al. [4] presented a hidden-to-observable Markov model for shot boundary detection for automated soccer video analysis and classification. In another technique, it is used for background extraction using the temporal and spatial relationship in the input sequence. Similarly, Baohan et al. [5] and Irfan et al. [6] extracted shots to generate a diverse video summary. The mentioned existing methods are generic and not applicable to activity recognition in industrial settings. It is evident from our experiments that state-of-the-art techniques segmented shots, considering all sort of visual contents. However, video segmentation for activity recognition requires the presence of humans in each shot. Thus, for better activity recognition, we considered only human saliency features to perform shot segmentation that suppresses unessential information. Furthermore, processing only important shots minimizes the execution time, avoids extra computations, and improves the accuracy of activity recognition.