1. INTRODUCTION
Medical image segmentation can help human doctors make more accurate and faster diagnoses. Segmentation and detection models are increasingly being integrated into mobile terminal applications to help doctors speed up diagnosis and treatment. For example, the handheld ultrasound scanner application [1] and the skin cancer detection application [2] are developed to install on mobile phones. U-Net [3] swept all major medical image segmentation fields, which is now the most common baseline model. Combined with the transformer [4], many excellent U-Net variants have emerged in recent years, such as TransUNet [5] and Swin-Unet [6]. Transformer-based improvements help the network obtain higher-dimensional global information, but they also introduce a large amount of computation, requiring more parameters and calculations. As a result, they are not suitable for real-time scenarios in hospitals.