I. Introduction
Perception of the world requires the interpretation of motion across different time scales. Detecting moving objects in video streams is a crucial task that is fundamental to many computer vision applications from video surveillance and monitoring to self-driving cars. The moving object detection process provides focus of attention for follow-up processes such as tracking, classification, recognition, behavior analysis etc. This is a challenging task because of background clutter, many distractors and changing imaging conditions such as camera motion, degraded environmental conditions, weather, dynamic background, illumination changes, shadows or camouflage effects. Many approaches and pipelines have been proposed to perform moving object detection and to address the aforementioned challenges [1]–[3]. Earlier approaches typically consisted of hand-crafted solutions with little adaptation to challenging scenarios and often relied on complex procedures to address specific challenges. In recent years, advances in deep learning combined with increased availability of training data and affordable high-end computing resources such as GPUs, have led to impressive results in various computer vision tasks. Transfer learning combined with many state-of-the-art CNN models (i.e. VGG-16, ResNet-18 etc.) trained on large benchmark datasets allowed development of feature extraction modules for various tasks with only minor modifications and limited training requirements. Autoencoders are a popular deep learning architecture for segmentation tasks, where the features extracted in the encoder module, using a series of convolution and pooling layers, are upsampled by the decoder module to recover the original spatial resolution of the input image.
Sample change detection results from cdnet-2014 including original image (row 1), ground truth mask (row 2), proposed MU-net2 mask (row 3), for three sample videos highway (baseline) fr=1100 (col l), canoe (dynamicbg) fr=960 (col 2), peopleinshade (shadow) fr=1100 (col 3).