I. Introduction
Running under complex working conditions such as high rotating speed and heavy load, critical elements, such as gearbox, bearings, and other transmission parts, will easily encounter damage and crack that raise the potential risk for the whole machine and even cause major accidents. The requirement of higher reliability and safety for the critical elements is growing significantly [1]. Therefore, it is important to conduct early fault detection (EFD) for the elements in time. Due to the fast development of computer science and sensor technology, various machine learning methods have been applied to intelligent EFD, including shallow models, such as support vector machine (SVM) [2] and Gaussian process [3], and deep models, such as convolutional neural network (CNN) [4] and long short-term memory (LSTM) [5]. At present, building a suitable detection model according to the characteristics of specific applications is still a key challenge of intelligent EFD.