Set-Constrained Viterbi for Set-Supervised Action Segmentation | IEEE Conference Publication | IEEE Xplore

Set-Constrained Viterbi for Set-Supervised Action Segmentation


Abstract:

This paper is about weakly supervised action segmentation, where the ground truth specifies only a set of actions present in a training video, but not their true temporal...Show More

Abstract:

This paper is about weakly supervised action segmentation, where the ground truth specifies only a set of actions present in a training video, but not their true temporal ordering. Prior work typically uses a classifier that independently labels video frames for generating the pseudo ground truth, and multiple instance learning for training the classifier. We extend this framework by specifying an HMM, which accounts for co-occurrences of action classes and their temporal lengths, and by explicitly training the HMM on a Viterbi-based loss. Our first contribution is the formulation of a new set-constrained Viterbi algorithm (SCV). Given a video, the SCV generates the MAP action segmentation that satisfies the ground truth. This prediction is used as a framewise pseudo ground truth in our HMM training. Our second contribution in training is a new regularization of feature affinities between training videos that share the same action classes. Evaluation on action segmentation and alignment on the Breakfast, MPII Cooking2, Hollywood Extended datasets demonstrates our significant performance improvement for the two tasks over prior work.
Date of Conference: 13-19 June 2020
Date Added to IEEE Xplore: 05 August 2020
ISBN Information:

ISSN Information:

Conference Location: Seattle, WA, USA
Citations are not available for this document.

1. Introduction

This paper addresses action segmentation by labeling video frames with action classes under set-level weak supervision in training. Set-supervised training means that the ground truth specifies only a set of actions present in a training video. Their temporal ordering and the number of their occurrences remain unknown. This is an important problem arising from the proliferation of big video datasets where providing detailed annotations of a temporal ordering of actions is prohibitively expensive. One example application is action segmentation of videos that have been retrieved from a dataset based on word captions [6], [20], where the captions do not describe temporal relationships of actions.

Cites in Papers - |

Cites in Papers - IEEE (29)

Select All
1.
Feixiang Zhou, Zheheng Jiang, Huiyu Zhou, Xuelong Li, "SMC-NCA: Semantic-Guided Multi-Level Contrast for Semi-Supervised Temporal Action Segmentation", IEEE Transactions on Multimedia, vol.26, pp.11386-11401, 2024.
2.
Guodong Ding, Hans Golong, Angela Yao, "Coherent Temporal Synthesis for Incremental Action Segmentation", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.28485-28494, 2024.
3.
Yuhan Shen, Ehsan Elhamifar, "Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.18186-18197, 2024.
4.
Zijia Lu, Ehsan Elhamifar, "FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.18175-18185, 2024.
5.
Shih–Po Lee, Zijia Lu, Zekun Zhang, Minh Hoai, Ehsan Elhamifar, "Error Detection in Egocentric Procedural Task Videos", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.18655-18666, 2024.
6.
Minghao Zou, Qingtian Zeng, Xue Zhang, "Weakly-Supervised Action Learning in Procedural Task Videos via Process Knowledge Decomposition", IEEE Transactions on Circuits and Systems for Video Technology, vol.34, no.7, pp.5575-5588, 2024.
7.
Quoc-Huy Tran, Ahmed Mehmood, Muhammad Ahmed, Muhammad Naufil, Anas Zafar, Andrey Konin, M. Zeeshan Zia, "Permutation-Aware Activity Segmentation via Unsupervised Frame-to-Segment Alignment", 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.6412-6422, 2024.
8.
Siddhant Bansal, Chetan Arora, C.V. Jawahar, "United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos", 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.6495-6505, 2024.
9.
Roy Hirsch, Regev Cohen, Tomer Golany, Daniel Freedman, Ehud Rivlin, "Random Walks for Temporal Action Segmentation with Timestamp Supervision", 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.6600-6610, 2024.
10.
Guodong Ding, Fadime Sener, Angela Yao, "Temporal Action Segmentation: An Analysis of Modern Techniques", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.46, no.2, pp.1011-1030, 2024.
11.
Saif Sayed, Reza Ghoddoosian, Bhaskar Trivedi, Vassilis Athitsos, "A New Dataset and Approach for Timestamp Supervised Action Segmentation Using Human Object Interaction", 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.3133-3142, 2023.
12.
Zhe Ming Chng, Calix Tang, Darshan Krishnaswamy, Haoyang Yang, Shivang Chopra, Jon Womack, Thad Starner, "Symbiotic Artificial Intelligence: Order Picking And Ambient Sensing", 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pp.1-5, 2023.
13.
Shanghua Gao, Zhong-Yu Li, Qi Han, Ming-Ming Cheng, Liang Wang, "RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.45, no.3, pp.2984-3002, 2023.
14.
Hamza Khan, Sanjay Haresh, Awais Ahmed, Shakeeb Siddiqui, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran, "Timestamp-Supervised Action Segmentation with Graph Convolutional Networks", 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.10619-10626, 2022.
15.
Yang Zhao, Yan Song, "Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation", 2022 IEEE International Conference on Multimedia and Expo (ICME), pp.01-06, 2022.
16.
Zexing Du, Xue Wang, Guoqing Zhou, Qing Wang, "Fast and Unsupervised Action Boundary Detection for Action Segmentation", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.3313-3322, 2022.
17.
Yuhan Shen, Ehsan Elhamifar, "Semi-Weakly-Supervised Learning of Complex Actions from Instructional Task Videos", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.3334-3344, 2022.
18.
Zijia Lu, Ehsan Elhamifar, "Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.19871-19881, 2022.
19.
Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran, "Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.20142-20153, 2022.
20.
Tung Doan, Atsuhiro Takasu, "Kernel Clustering With Sigmoid Regularization for Efficient Segmentation of Sequential Data", IEEE Access, vol.10, pp.62848-62862, 2022.
21.
Reza Ghoddoosian, Saif Sayed, Vassilis Athitsos, "Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos", 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.120-130, 2022.
22.
Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes, "SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation", 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.175-184, 2022.
23.
Zixuan Zou, Jiaqi Zou, Junzhe Liu, Songlin Sun, "A Temporal Convolutional Network for Weakly Supervised Action Segmentation", 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC), pp.359-363, 2021.
24.
Longshuai Sheng, Ce Li, Yihan Tian, "Coarse-to-Fine Loss Based On Viterbi Algorithm for Weakly Supervised Action Segmentation", 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML), pp.1-6, 2021.
25.
Zijia Lu, Ehsan Elhamifar, "Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces Learning", 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp.8065-8075, 2021.
26.
Yuhan Shen, Lu Wang, Ehsan Elhamifar, "Learning to Segment Actions from Visual and Language Instructions via Differentiable Weak Sequence Alignment", 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.10151-10160, 2021.
27.
Zhe Li, Yazan Abu Farha, Juergen Gall, "Temporal Action Segmentation from Timestamp Supervision", 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.8361-8370, 2021.
28.
Jun Li, Sinisa Todorovic, "Action Shuffle Alternating Learning for Unsupervised Action Segmentation", 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.12623-12631, 2021.
29.
Shang-Hua Gao, Qi Han, Zhong-Yu Li, Pai Peng, Liang Wang, Ming-Ming Cheng, "Global2Local: Efficient Structure Search for Video Action Segmentation", 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.16800-16809, 2021.

Cites in Papers - Other Publishers (5)

1.
Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim, "BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation", Computer Vision ? ACCV 2022, vol.13844, pp.485, 2023.
2.
Longshuai Sheng, Ce Li, "Weakly supervised coarse-to-fine learning for human action segmentation in HCI videos", Multimedia Tools and Applications, 2022.
3.
Rahul Rahaman, Dipika Singhania, Alexandre Thiery, Angela Yao, "A Generalized and Robust Framework for Timestamp Supervision in Temporal Action Segmentation", Computer Vision ? ECCV 2022, vol.13664, pp.279, 2022.
4.
Siddhant Bansal, Chetan Arora, C. V. Jawahar, "My View is the Best View: Procedure Learning from Egocentric Videos", Computer Vision ? ECCV 2022, vol.13673, pp.657, 2022.
5.
Wei-Chen Chen, Xin-Yi Yu, Lin-Lin Ou, "Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization", Machine Intelligence Research, 2022.
Contact IEEE to Subscribe

References

References is not available for this document.