Conferences >2023 IEEE/CVF International C...

Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We present a novel method for weakly-supervised action segmentation and unseen error detection in anomalous instructional videos. In the absence of an appropriate dataset...Show More

Metadata

Abstract:

We present a novel method for weakly-supervised action segmentation and unseen error detection in anomalous instructional videos. In the absence of an appropriate dataset for this task, we introduce the Anomalous Toy Assembly (ATA) dataset ¹, which comprises 1152 untrimmed videos of 32 participants assembling three different toys, recorded from four different viewpoints. The training set comprises 27 participants who assemble toys in an expected and consistent manner, while the test and validation sets comprise 5 participants who display sequential anomalies in their task. We introduce a weakly labeled segmentation algorithm that is a generalization of the constrained Viterbi algorithm and identifies potential anomalous moments based on the difference between future anticipation and current recognition results. The proposed method is not restricted by the training transcripts during testing, allowing for the inference of anomalous action sequences while maintaining real-time performance. Based on these segmentation results, we also introduce a baseline to detect pre-defined human errors, and benchmark results on the ATA dataset. Experiments were conducted on the ATA and CSV datasets, outperforming the state-of-the-art in segmenting anomalous videos under both online and offline conditions.

Published in: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Date of Conference: 01-06 October 2023

Date Added to IEEE Xplore: 15 January 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCV51070.2023.00929

Conference Location: Paris, France

Contents

1. Introduction

One of the challenges in human machine interaction is the automatic vision-based understanding of human actions in instructional videos. These videos depict a series of low-level actions that collectively accomplish a top-level task, such as preparing a meal or assembling an object. However, labeling each frame of these videos can be arduous and necessitate a significant amount of manual effort to note the start and end times of each action segment. Consequently, there has been a surge of research interest in developing weakly-supervised methods to learn the actions. In particular, such methods aim to overcome the challenge of weakly-labeled instructional videos, where only the ordered sequence of action labels (transcript) is provided without any information on the duration of each action.

References is not available for this document.

Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos

Abstract:

Metadata

Abstract:

ISSN Information:

Description

1. Introduction

Description

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Description

1. Introduction

Description

References

IEEE Account

Purchase Details

Profile Information

Need Help?