I. Introduction
In the past few years, numerous approaches leveraging deep learning techniques have demonstrated remarkable performance in tasks related to classification and detection. Such exponentially growing network models have gradually been widely applied in fields such as autonomous driving, industry, and medicine. Currently, object detection algorithms based on deep neural networks can be broadly categorized into two groups: one-stage detection networks and two-stage detection networks. One-stage detection networks are most representative by the YOLO series [1]–[6] of networks, while two-stage detection networks are most representative by Faster R-CNN [7]. These networks have gradually been applied to production and daily life, and have played an important role in textile defect detection. Jin [8] proposed the SE-YOLOv5 model by adding SE attention modules to the YOLOv5 backbone, which improves the accuracy, generalization ability, and robustness of fabric defect detection. However, the network model is still relatively large and may not have high real-time performance. Luo [9] introduced depthwise separable convolution and attention mechanism to the YOLOv4 neck, proposing the attention-based lightweight model YOLO-SCD, which improves the detection speed of the entire network, but the model size still relatively large. Jia [10] improved the detection accuracy and robustness of small target defects in fabric defect detection algorithms based on transfer learning and improved Faster R-CNN, but the detection efficiency is low and cannot meet industrial production needs. Liu [11] combined segmentation networks and generative adversarial networks (GAN) to enable the model to adapt to unknown defect types, but a fatal flaw is that it cannot take different emergency measures according to different defect types.