1. Introduction
It has been estimated that in 2019, 180 million security cameras were shipped worldwide [9], while the attention span of a human camera operator has been estimated at only 20 minutes of continuous manual monitoring [4, 3]. The gap between the massive volume of data available and the scarce capacity of human analysts has been closed, but not eliminated, by the rapid advancement of computer vision techniques, particularly deep-learning based methods.