I. Introduction
Anomaly detection plays a crucial role in various industrial applications, ensuring product quality and system reliability [1] [2] [3] [4] [5]. Recently, zero-shot anomaly detection methods, such as those based on the CLIP (Contrastive Language-Image Pre-training) model [6] [7], have gained significant attention due to their ability to detect anomalies without requiring labeled anomaly data during training. However, directly applying CLIP to industrial anomaly detection tasks can be challenging [8] [9], as the model was trained on a vast dataset of natural images that may not fully capture the unique characteristics of industrial settings [10].