I. Introduction
Fault diagnosis is critically important to improving the safety of modern engineering systems by promptly detecting and isolating abnormal parts, preventing fault propagation and evolution, hence reducing the loss after unexpected failures occur [1]. Benefiting from advanced sensing and information technology, data-driven fault diagnosis techniques have been deeply studied and widely adopted in many areas [2], [3], [4], [5]. However, most existing data-driven fault diagnosis methods require enough fault samples to develop fault diagnosis models, while fault samples are indeed scarce and difficult to obtain in real-world applications. It poses great challenges to implement the state-of-the-art data-driven fault diagnosis techniques. In recent years, fault diagnosis with limited fault samples has been attracting increasing attention in both academia and industries [6], [7], [8].