1. Introduction
Single Image Super-Resolution (SISR) is a well-established task in low-level computer vision, which aims to reconstruct a high-resolution image from a single low-resolution image. This task has broad applicability in enhancing image quality across various domains [16], [37], [43], [44], [48], [49], [57]. The advent of deep learning has led to significant advancements in this field [2], [10], [12], [19], [24], [32], [34], [36], [50], [59]. Recent progress in super-resolution tasks has been largely driven by the attention mechanism. Numerous state-of-the-art super-resolution networks incorporate attention mechanisms or even employ larger vision transformers (ViTs) as the model architecture [6], [8], [20], [27], [32], [35], [42], [53], [60]. These networks emphasize key features and long-distance dependencies between patches through attention maps, capturing a wider range of contextual information to ensure continuity of details and accuracy of edge textures. However, the computational requirements of the attention mechanism, which involve complex network structures and a substantial number of additional parameters, lead to challenges such as large model size and slow inference speed. These challenges limit the applicability of these models, hindering their use in efficient, high-speed computing scenarios, such as SISR tasks on resource-constrained mobile devices.