1. Introduction
Video Anomaly Detection (VAD) refers to the identification of events that do not conform to expected behaviors [3] in a video, with one example shown in Figure 1. This is an open and very challenging task as abnormal events usually much less happen than normal ones and the forms of abnormal events are unbounded in practical applications [25]. Obviously, it is impossible to collect all kinds of abnormal data in advance. Therefore, a typical solution to video anomaly detection is to train an unsupervised learning model on normal data, and those events or activities that are recognized by the trained model as outliers are then deemed as anomalies.