Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors | IEEE Conference Publication | IEEE Xplore

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors


Abstract:

We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model...Show More

Abstract:

We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model is threefold. First, we introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. Second, we integrate a teacher decoder and a student decoder into our architecture, leveraging the discrepancy between the outputs given by the two decoders to improve anomaly detection. Third, we generate synthetic abnormal events to augment the training videos, and task the masked AE model to jointly reconstruct the original frames (without anomalies) and the corresponding pixel-level anomaly maps. Our design leads to an efficient and effective model, as demonstrated by the extensive experiments carried out on four benchmarks: Avenue, Shanghai Tech, UBnormal and UCSD Ped2. The empirical results show that our model achieves an excellent trade-off between speed and accuracy, obtaining competitive AUC scores, while processing 1655 FPS. Hence, our model is between 8 and 70 times faster than competing methods. We also conduct an ablation study to justify our design. Our code is freely available at: https://github.com/ristea/aed-mae.
Date of Conference: 16-22 June 2024
Date Added to IEEE Xplore: 16 September 2024
ISBN Information:

ISSN Information:

Conference Location: Seattle, WA, USA
No metrics found for this document.

1. Introduction

In recent years, research on abnormal event detection in video gained significant traction [1, 10, 17, 18, 26–28, 36, 38, 43, 44, 49, 52, 57, 58, 61, 62, 65, 69, 76, 78, 80, 83, 87, 90, 95, 97–100], due to its utter importance in video surveillance. Despite the growing interest, video anomaly detection remains a complex task, owing its complexity to the fact that abnormal situations are context-dependent and do not occur very often. This makes it very difficult to collect a representative set of abnormal events for training state-of-the-art deep learning models in a fully supervised manner. To showcase the rarity and reliance on context of anoma-lies, we refer to the vehicle ramming attacks carried out by terrorists against pedestrians. As soon as a car is steered on the sidewalk, it becomes an abnormal event. Hence, the place where the car is driven (street versus sidewalk) determines the normal or abnormal label of the action, i.e. the label depends on context. Furthermore, there are less than 200 vehicle ramming attacks registered to date

https://en.wikipedia.org/wiki/Vehicle-ramming_attack

, confirming the scarcity of such events (even less are caught on video).

Our masked auto-encoder for abnormal event detection based on self-distillation. At training time, some video frames are augmented with synthetic anomalies. The teacher decoder learns to reconstruct original frames (without anomalies) and predict anomaly maps. The student decoder learns to reproduce the teacher's output. Motion gradients are aggregated at the token level and used as weights for the reconstruction loss. Red dashed lines represent steps executed only during training.

Performance versus speed trade-offs for our self-distilled masked AE and several state-of-the-art methods [26–28, 47, 49, 60, 61, 69, 84] (with open-sourced code), on the Avenue data set. The running times of all methods are measured on a computer with one Nvidia GeForce GTX 3090 GPU with 24 GB of VRAM. Best viewed in color.

Usage
Select a Year
2025

View as

Total usage sinceSep 2024:104
051015202530JanFebMarAprMayJunJulAugSepOctNovDec72218000000000
Year Total:47
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.