Loading [MathJax]/extensions/TeX/ieee_stixext.js
Efficient Temporal Action Localization with Temporal Attention and Gaussian Weight | IEEE Conference Publication | IEEE Xplore

Efficient Temporal Action Localization with Temporal Attention and Gaussian Weight


Abstract:

The task of temporal action localization is to recognize the action categories and meanwhile detect the start and end time of each action instance. In this paper, we prop...Show More

Abstract:

The task of temporal action localization is to recognize the action categories and meanwhile detect the start and end time of each action instance. In this paper, we propose a temporal attention and gaussian weighted anchor-free method, named TG-TAL, for temporal action localization. Rather than using anchors, our method regresses action instances directly with video frames as samples. To better address the variable length of action instance, we introduce a multi-level prediction framework with temporal attention. An additional gaussian weight branch is also defined to enhance the classification performance on low-quality temporal segments. Extensive experiments demonstrate that our method is effective on various datasets. In particular, on THUMOS14, our method outperforms one-stage temporal action localization methods and establishes a new state-of-the-art performance with an mAP(%) of 41.9 at tIoU threshold 0.5. Our method also works with two-stages methods and proposal postprocessing methods. Combined with PGCN, our method surpasses the state-of-the-art methods at tIoU threshold 0.7 and achieves a new state-of-the-art performance of 24.1 in terms of mAP(%) on THUMOS14.
Date of Conference: 18-23 June 2023
Date Added to IEEE Xplore: 02 August 2023
ISBN Information:

ISSN Information:

Conference Location: Gold Coast, Australia

Funding Agency:


I. Introduction

Video analysis and understanding has received extensive attention from academia and industry because of its broad applications in many fields. One important task is action recognition which recognizes the action classes in the trimmed action segments. However, most videos in the real world are untrimmed long videos. Therefore, to implement the existing algorithms for this task, one has to first know when the action happens. Recently, researchers pay more attention to a similar task termed as temporal action location, in which the action category of an untrimmed action instance is recognized, together with its start and end time.

References

References is not available for this document.