1. Introduction
The descriptor performs more effective than the real video, without the feature loss. Hence, contrasted to conventional technique, it is defined as the more extensive and elaborated video local information scales. The effective video description operator can be greatly aided by the accurate representation of the spatial as well as the time-based reliable information from videos, representing significant research value in perspective and analysis of video. As a result, numerous works have been discussed in this section along with remarkable advancements from diverse angles and a variety of problems. The fast and accurate face recognition through video data is highly demanding and crucial thing as increasing and demanding utilization of image video by CCTV and smart devices such as digital cameras or smartphone world-wide [1]. However, such complex issues are comparatively exciting and challenging, just because of different factors having dependency on quality of input. But right now just because of high computer vision development in some recent years, it has obtained good milestones. In reality, such a complex task is fragmented into smaller parts which are able to handle properly to achieve simpler and handy solutions. The main approach is to identify the face image frame from input video, then first task to relevant face detection in a picture and second face recognition. Meanwhile, some other activities were carried out, like localization of faces, validating faces or extraction of additional features and characteristics from them. There are multiple algorithms and methods to handle such complex tasks, like Eigen faces or Active Shape models, used rigorously and continuously just because of their outstanding result time. It has been found that one of the most popular and promising methods of Deep Learning (DL), principally the Convolutional Neural Networks. Now Convolutional Neural Networks yield most accurately results with continuous improvement. After looking at all the current situation and state of result, we selected to focus on our study approach to concentrate on more factors which help to improve the outcome quality in terms of time and accuracy. Additionally, it collected face images from high definition devices like cameras, effectively using deep learning techniques to concentrate on face localization, identification and recognition purposes, to improve performance of the overall system.