I. Introduction
Shadows are a ubiquitous feature in natural images, offering valuable cues for extracting scene geometry [1], [2], [3], [4], [5], estimating light directions, and determining camera locations and parameters [2]. Additionally, shadows have the potential to enhance a diverse range of image understanding tasks, including image segmentation [6], object detection [7], image editing [8], and object tracking [9]. The last decade has witnessed a growing interest in image shadow detection. Early methods addressed the shadow detection task in still single image by examining color and illumination priors [10], by developing data-driven approaches with hand-crafted features [11], [12], [13], or by learning deep discriminative features via diverse convolutional neural networks (CNNs) [14], [15], [16], [17], [18], [19], [20]. While image-based shadow detectors can be applied frame by frame to detect shadow pixels, their performance is often unsatisfactory due to the lack of consideration for temporal information from neighboring video frames.