Abstract:
Remote sensing semantic segmentation tasks aim to automatically extract land cover types by accurately classifying each pixel. However, large-scale hyperspectral remote s...Show MoreMetadata
Abstract:
Remote sensing semantic segmentation tasks aim to automatically extract land cover types by accurately classifying each pixel. However, large-scale hyperspectral remote sensing images possess rich spectral information, complex and diverse spatial distributions, significant scale variations, and a wide variety of land cover types with detailed features, which pose significant challenges for segmentation tasks. To overcome these challenges, this study introduces a U-shaped semantic segmentation network that combines global spectral attention and deformable Transformer for segmenting large-scale hyperspectral remote sensing images. First, convolution and global spectral attention are utilized to emphasize features with the richest spectral information, effectively extracting spectral characteristics. Second, deformable self-attention is employed to capture global-local information, addressing the complex scale and distribution of objects. Finally, deformable cross-attention is used to aggregate deep and shallow features, enabling comprehensive semantic information mining. Experiments conducted on a large-scale hyperspectral remote sensing dataset (WHU-OHS) demonstrate that: first, in different cities including Changchun, Shanghai, Guangzhou, and Karamay, DTSU-Net achieved the highest performance in terms of mIoU compared to the baseline methods, reaching 56.19%, 37.89%, 52.90%, and 63.54%, with an average improvement of 7.57% to 34.13%, respectively; second, module ablation experiments confirm the effectiveness of our proposed modules, and deformable Transformer significantly reduces training costs compared to conventional Transformers; third, our approach achieves the highest mIoU of 57.22% across the entire dataset, with a balanced trade-off between accuracy and parameter efficiency, demonstrating an improvement of 1.65% to 56.58% compared to the baseline methods.
Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 17)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Semantic Segmentation ,
- Large-scale Image ,
- Deformation Transformation ,
- Large-scale Hyperspectral Image ,
- Rich Information ,
- Semantic Information ,
- Deep Features ,
- Semantic Network ,
- Shallow Features ,
- Object Scale ,
- Semantic Segmentation Network ,
- Distribution Of Objects ,
- Rich Spectral Information ,
- Spectral Transformation ,
- Convolutional Neural Network ,
- Computational Complexity ,
- Local Information ,
- Spatial Information ,
- Feature Maps ,
- Receptive Field ,
- Multi-head Self-attention ,
- Global Max Pooling ,
- Global Information ,
- Semantic Segmentation Methods ,
- Multi-scale Features ,
- Attention Weights ,
- Self-attention Mechanism ,
- Multi-scale Feature Maps ,
- Parameter Count ,
- Floating-point Operations
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Semantic Segmentation ,
- Large-scale Image ,
- Deformation Transformation ,
- Large-scale Hyperspectral Image ,
- Rich Information ,
- Semantic Information ,
- Deep Features ,
- Semantic Network ,
- Shallow Features ,
- Object Scale ,
- Semantic Segmentation Network ,
- Distribution Of Objects ,
- Rich Spectral Information ,
- Spectral Transformation ,
- Convolutional Neural Network ,
- Computational Complexity ,
- Local Information ,
- Spatial Information ,
- Feature Maps ,
- Receptive Field ,
- Multi-head Self-attention ,
- Global Max Pooling ,
- Global Information ,
- Semantic Segmentation Methods ,
- Multi-scale Features ,
- Attention Weights ,
- Self-attention Mechanism ,
- Multi-scale Feature Maps ,
- Parameter Count ,
- Floating-point Operations
- Author Keywords