Conferences >2023 IEEE/CVF Conference on C...

N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

While some studies have proven that Swin Transformer (Swin) with window self-attention (WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the...Show More

Metadata

Abstract:

While some studies have proven that Swin Transformer (Swin) with window self-attention (WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the broad regions when reconstructing high-resolution images due to a limited receptive field. In addition, many deep learning SR methods suffer from intensive computations. To address these problems, we introduce the N-Gram context to the low-level vision with Transformers for the first time. We define N-Gram as neighboring local windows in Swin, which differs from text analysis that views N-Gram as consecutive characters or words. N-Grams interact with each other by sliding-WSA, expanding the regions seen to restore degraded pixels. Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking multi-scale outputs of the hierarchical encoder. Experimental results show that NGswin achieves competitive performance while maintaining an efficient structure when compared with previous leading methods. Moreover, we also improve other Swin-based SR methods with the N-Gram context, thereby building an enhanced model: SwinIR-NG. Our improved SwinIR-NG out-performs the current best lightweight SR approaches and establishes state-of-the-art results. Codes are available at https://github.com/rami0205/NGramSwin.

Published in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 17-24 June 2023

Date Added to IEEE Xplore: 22 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR52729.2023.00206

Conference Location: Vancouver, BC, Canada

Contents

1. Introduction

The goal of single image super-resolution (SR) is to reconstruct high-resolution (HR) images from low-resolution (LR) images. Many deep learning-based methods have worked in this field. In particular, several image restoration studies [16], [27], [55], [61], [63], [66] have adapted the window self-attention (WSA) proposed by Swin Transformer (Swin) [32] as it integrates long-range dependency of Vision Transformer [14] and locality of conventional convolution. However, two critical problems remain in these works. First, the receptive field of the plain WSA is limited within a small local window [52], [56], [58]. It prevents the models from utilizing the texture or pattern of neighbor windows to recover degraded pixels, producing the distorted images. Second, recent state-of-the-art SR [9], [27], [61], [66] and lightweight SR [6], [15], [35], [63] networks require intensive computations. Reducing operations is essential for real-world applications if the parameters are kept around a certain level (e.g., 1M, 4MB sizes), because the primary consumption of semiconductor energy (concerning time) for neural networks is related to Mult-Adds operations [17], [47]. Figure 1.

Two tracks of this paper using the n-gram context. (Left) NGswin outperforms previous leading SR methods with an efficient structure. (Right) Our proposed N-Gram context improves different Swin Transformer-based SR models.

References is not available for this document.

N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References