Loading [MathJax]/extensions/MathMenu.js
Learning Motion-Guided Multi-Scale Memory Features for Video Shadow Detection | IEEE Journals & Magazine | IEEE Xplore

Learning Motion-Guided Multi-Scale Memory Features for Video Shadow Detection


Abstract:

Natural images often contain multiple shadow regions, and existing video shadow detection methods tend to fail in fully identifying all shadow regions, since they mainly ...Show More

Abstract:

Natural images often contain multiple shadow regions, and existing video shadow detection methods tend to fail in fully identifying all shadow regions, since they mainly learned temporal features at single-scale and single memory. In this work, we develop a novel convolutional neural network (CNN) to learn motion-guided multi-scale memory features to obtain multi-scale temporal information based on multiple network memories for boosting video shadow detection. To do so, our network first constructs three memories (i.e., a global memory, a local memory, and a motion memory) to combine spatial context and object motion for detecting shadows. Based on these three memories, we then devise a multi-scale motion-guided long-short transformer (MMLT) module to learn multi-scale temporal and motion memory features for predicting a shadow detection map of the input video frame. Our MMLT module includes a dense-scale long transformer (DLT), a dense-scale short transformer (DST), and a dense-scale motion transformer (DMT) to read three memories for learning multi-scale transformer features. Our DLT, DST, and DMT consist of a set of memory-read pooling attention (MPA) blocks and densely connect these output features of multiple MPA blocks to learn multi-scale transformer features since the scales of these output features are varied. By doing so, we can more accurately identify multiple shadow regions with different sizes from the input video. Moreover, we devise a self-supervised pretext task to pre-training the feature encoder for enhancing the downstream video shadow detection. Experimental results on three benchmark datasets show that our video shadow detection network quantitatively and qualitatively outperforms 26 state-of-the-art methods.
Page(s): 12288 - 12300
Date of Publication: 26 July 2024

ISSN Information:

Funding Agency:

No metrics found for this document.

I. Introduction

Shadows are a ubiquitous feature in natural images, offering valuable cues for extracting scene geometry [1], [2], [3], [4], [5], estimating light directions, and determining camera locations and parameters [2]. Additionally, shadows have the potential to enhance a diverse range of image understanding tasks, including image segmentation [6], object detection [7], image editing [8], and object tracking [9]. The last decade has witnessed a growing interest in image shadow detection. Early methods addressed the shadow detection task in still single image by examining color and illumination priors [10], by developing data-driven approaches with hand-crafted features [11], [12], [13], or by learning deep discriminative features via diverse convolutional neural networks (CNNs) [14], [15], [16], [17], [18], [19], [20]. While image-based shadow detectors can be applied frame by frame to detect shadow pixels, their performance is often unsatisfactory due to the lack of consideration for temporal information from neighboring video frames.

Usage
Select a Year
2025

View as

Total usage sinceJul 2024:183
051015202530JanFebMarAprMayJunJulAugSepOctNovDec261518000000000
Year Total:59
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.