Loading [MathJax]/extensions/TeX/cellcolor_ieee.js
A Comparative Analysis of Deep Learning Based Human Action Recognition Algorithms | IEEE Conference Publication | IEEE Xplore

A Comparative Analysis of Deep Learning Based Human Action Recognition Algorithms

Publisher: IEEE

Abstract:

Human Action Recognition (HAR) is a crucial and prominent area in computer vision. It recognizes and predicts human actions from videos and has an ample range of applicat...View more

Abstract:

Human Action Recognition (HAR) is a crucial and prominent area in computer vision. It recognizes and predicts human actions from videos and has an ample range of applications, such as surveillance, healthcare, sports analytics, virtual reality, gaming, video searching, and human-computer interaction. However, HAR in videos is associated with several challenges, such as occlusions, viewpoint variation, cluttered backgrounds, camera motion, and execution rate. With the aim of recognizing and predicting diverse actions in videos, numerous models have been developed over the years to address the associated challenges. Deep learning techniques have exhibited significant potential in this regard. Researchers have used datasets such as single viewpoint, multi-viewpoint, and RGB-depth videos to explore HAR using deep learning techniques. This paper surveys the deep learning-based human action recognition algorithms on different datasets, highlighting recent advancements and the growing demand for HAR in videos. Furthermore, we present comparative performance evaluations on widely used benchmark datasets for HAR.
Date of Conference: 06-08 July 2023
Date Added to IEEE Xplore: 23 November 2023
ISBN Information:

ISSN Information:

Publisher: IEEE
Conference Location: Delhi, India

I. Introduction

Human Action Recognition (HAR) is an essential work in computer vision that involves recognizing and predicting human actions. HAR has numerous applications, such as abnormal human actions detection, video retrieval, healthcare [1], human-computer/robot interaction, and gaming [2]. HAR goes beyond simply representing patterns of motion of distinct body parts; it also describes a person’s intentions, emotions, and thoughts, making it an essential ingredient in recognizing and predicting human behavior. In recent years, lots of work has been done in computer vision [3]–[5], [71]–[73] related to classification, segmentation, resolution, etc. Traditionally, handcrafted feature-based approaches were used to determine HAR in videos. Visual features that describe a region locally were extracted. Combining the local features resulted in a video-level description of fixed size. HAR system analyzes the sequence of video frames for learning the features of human action in the training phase and uses this learned feature for classifying the same kind of action in the testing phase [6]. Traditional approaches are limited to handcrafted features [36], [41] and take much computational time. Deep learning-based approaches have led to the enhancement of HAR performance. Deep learning-based approaches refer to analyzing, recognizing, and then accurately predicting the human behaviors depicted in the videos. The primary feature extraction method in videos for accomplishing HAR is Convolutional Neural Networks (CNNs). However, modeling temporal information in videos is a complex task in HAR as it involves understanding the dynamics of actions over time. Unlike images, videos contain temporal information that captures the motion and evolution of the scene over time. Deep learning-based approaches need a substantial number of labeled data. Large-scale action video datasets like UCF101, HMDB51, and Kinetics have aided in developing more accurate and efficient HAR models. This paper focuses on the evolution of HAR in video analysis. Our discussion begins with exploring the two-stream networks, which were critical in developing more efficient models for HAR. We also analyze the systematic advancements made to HAR over the years. The paper is structured as follows: Section III outlines the popular datasets for HAR. Section IV covers the various advancements in HAR and their limitations. We compare the discussed approaches in Section V. The paper is concluded in Section VI.

References

References is not available for this document.