I. Introduction
The continual proliferation of large-scale datasets and the concurrent improvement in hardware performance have propelled the progress of deep learning in computer vision, yielding significant scientific and technological advancements. Researchers have developed a diverse array of vision models, encompassing tasks such as object detection [1], semantic segmentation [2]–[3], image classification [4] and instance segmentation [5]. These models find applications across various domains to address diverse needs.