UNFOLD: 3-D U-Net, 3-D CNN, and 3-D Transformer-Based Hyperspectral Image Denoising | IEEE Journals & Magazine | IEEE Xplore

UNFOLD: 3-D U-Net, 3-D CNN, and 3-D Transformer-Based Hyperspectral Image Denoising


Abstract:

Hyperspectral images (HSIs) encompass data across numerous spectral bands, making them valuable in various practical fields such as remote sensing, agriculture, and marin...Show More

Abstract:

Hyperspectral images (HSIs) encompass data across numerous spectral bands, making them valuable in various practical fields such as remote sensing, agriculture, and marine monitoring. Unfortunately, inevitable noise introduction during sensing restricts their applicability, necessitating denoising for optimal utilization. The existing deep learning (DL)-based denoising methods suffer from various limitations. For instance, convolutional neural networks (CNNs) struggle with long-range dependencies, while vision transformers (ViTs) struggle to capture local details. This article introduces a novel method, UNFOLD, that addresses these inherent limitations by harmoniously integrating the strengths of 3-D U-Net, 3-D CNN, and 3-D Transformer architectures. Unlike several existing methods that predominantly capture dependencies either along the spatial or the spectral dimension, UNFOLD addresses HSI denoising as a 3-D task, synergizing spatial and spectral information through the utilization of 3-D Transformer and 3-D CNN. It employs the self-attention (SA) mechanism of Transformers to capture the global dependencies and model long-range relationships across spatial and spectral dimensions. To overcome the limitations of 3-D Transformer in capturing fine-grained local and spatial features, UNFOLD complements it by incorporating 3-D CNN. Moreover, UNFOLD utilize a modified form of 3-D U-Net architecture for HSI denoising, wherein it employs a 3-D Transformer-based encoder instead of the conventional 3-D CNN-based encoder. It further capitalizes on the property of U-Net to integrate features across various scales, thereby enhancing efficacy by preserving intricate structural details. Results from extensive experiments demonstrate that UNFOLD outperforms the state-of-the-art HSI denoising methods.
Article Sequence Number: 5529710
Date of Publication: 31 October 2023

ISSN Information:

Funding Agency:


I. Introduction

Hyperspectral image (HSI) contains information at several spectrums, thus, extensively used in several real-world domains, including remote sensing [1], classification [2], [3], [4], agriculture [5], and marine monitoring [6]. It is represented as a 3-D array, incorporating two spatial and one spectral dimension. Unfortunately, noise can be added during the HSI sensing due to various factors, including limited light, photon effects, and atmospheric interference [7], thereby degrading HSI quality. This issue is mitigated by HSI denoising. In computer vision, image denoising is performed by analyzing each pixel’s behavior with respect to its local neighborhood or global context. Several linear filters are extensively employed in the literature for local neighborhood analysis. Similarly, the nonlocal means filter [8] and non-local meets Global (NGMeet) [9] have been utilized to analyze global context or long-range dependencies for denoising [10]. Hence, image denoising can be effectively performed by analyzing the local neighborhood and global context.

Contact IEEE to Subscribe

References

References is not available for this document.