Abstract:
Due to the advantages of long-range modeling via the self-attention mechanism, Transformer has taken various vision tasks by storm, including image super-resolution (SR)....Show MoreMetadata
Abstract:
Due to the advantages of long-range modeling via the self-attention mechanism, Transformer has taken various vision tasks by storm, including image super-resolution (SR). In this study, we reveal that the convolutional neural network (CNN) with proper visual attention is a more simple and effective paradigm than Transformer in image SR tasks. We reexamine the successful SR models and discover several key characteristics that contribute to accurate image reconstruction. Built on this recipe, we propose a pure CNN-based SR network using efficient visual attention, dubbed EvaSR. Benefiting from the carefully designed visual attention, our EvaSR can favorably capture both local structure and long-range dependencies, and achieve adaptivity in spatial and channel dimensions while retaining the simplicity and efficiency of CNNs. The experimental results demonstrate that our EvaSR achieves state-of-the-art performance among the existing efficient SR methods. Especially, the tiny version of EvaSR needs 21.4% and 15.2% parameters of IMDN and SMSR with better performance.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:
ISSN Information:
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Super-resolution ,
- Visual Attention ,
- Neural Network ,
- Convolutional Neural Network ,
- Local Structure ,
- Image Reconstruction ,
- Vision Tasks ,
- Channel Dimension ,
- Self-attention Mechanism ,
- Long-range Dependencies ,
- Super-resolution Network ,
- Super-resolution Task ,
- Super-resolution Model ,
- Local Information ,
- Large Field ,
- Attention Mechanism ,
- Receptive Field ,
- Feed-forward Network ,
- Spatial Attention ,
- Fewer Parameters ,
- Depthwise Convolution ,
- Local Structure Information ,
- Edge Devices ,
- Large Receptive Field ,
- Deep Feature Extraction ,
- Element-wise Multiplication ,
- Shallow Features ,
- Single Image Super-resolution ,
- Computational Overhead ,
- Pointwise Convolution
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Super-resolution ,
- Visual Attention ,
- Neural Network ,
- Convolutional Neural Network ,
- Local Structure ,
- Image Reconstruction ,
- Vision Tasks ,
- Channel Dimension ,
- Self-attention Mechanism ,
- Long-range Dependencies ,
- Super-resolution Network ,
- Super-resolution Task ,
- Super-resolution Model ,
- Local Information ,
- Large Field ,
- Attention Mechanism ,
- Receptive Field ,
- Feed-forward Network ,
- Spatial Attention ,
- Fewer Parameters ,
- Depthwise Convolution ,
- Local Structure Information ,
- Edge Devices ,
- Large Receptive Field ,
- Deep Feature Extraction ,
- Element-wise Multiplication ,
- Shallow Features ,
- Single Image Super-resolution ,
- Computational Overhead ,
- Pointwise Convolution
- Author Keywords