Loading [MathJax]/extensions/MathMenu.js
Adaptive Fusion and Category-Level Dictionary Learning Model for Multiview Human Action Recognition | IEEE Journals & Magazine | IEEE Xplore

Adaptive Fusion and Category-Level Dictionary Learning Model for Multiview Human Action Recognition


Abstract:

Human actions are often captured by multiple cameras (or sensors) to overcome the significant variations in viewpoints, background clutter, object speed, and motion patte...Show More

Abstract:

Human actions are often captured by multiple cameras (or sensors) to overcome the significant variations in viewpoints, background clutter, object speed, and motion patterns in video surveillance, and action recognition systems often benefit from fusing multiple types of cameras (sensors). Therefore, adaptive fusion of the information from multiple domains is mandatory for multiview human action recognition. Two widely applied fusion schemes are feature-level fusion and score-level fusion. We point out that limitations still exist and there is tremendous room for improvement, including the separate computation of feature fusion and action recognition, or the fixed weights for each action and each camera. However, previous fusion methods cannot accomplish them. In this paper, inspired by nature, the above limitations are addressed for multiview action recognition by developing a novel adaptive fusion and category-level dictionary learning model (abbreviated to AFCDL). It can jointly learn the adaptive weight for each camera and optimize the reconstruction of samples toward the action recognition task. To induce the dictionary learning and the reconstruction of query set (or test samples), the induced set for each category is built, and the corresponding induced regularization term is designed for the objective function. Extensive experiments on four public multiview action benchmarks show that AFCDL can significantly outperforms the state-of-the-art methods with 3% to 10% improvement in recognition accuracy.
Published in: IEEE Internet of Things Journal ( Volume: 6, Issue: 6, December 2019)
Page(s): 9280 - 9293
Date of Publication: 17 April 2019

ISSN Information:

Funding Agency:


I. Introduction

Recognizing human actions in videos is a challenging task. It has received a significant amount of attention from the research community due to its wide range of applications in visual surveillance and human computer interaction, among others [1]–[13]. Although more than a decade of active research has been conducted [1]–[5], there still exist many unsolved problems due to the following reasons. First, this task typically has large interclass difference due to the variations in viewpoints, background clutter, object speed, and motion patterns. Second, if complex contextual information is present, such as unexpected interaction between objects, people, and scene, it might have negative influence on action recognition. Third, the diverse and the dynamic nature within an action category makes it difficult to model the salient action units. Therefore, designing a robust human action recognition model is an urgent requirement.

Contact IEEE to Subscribe

References

References is not available for this document.