I. Introduction
Being widely used in industrial equipment, rolling bearings play a crucial role in modern rotating machinery and their health conditions has a great impact on the machinery health and its remaining lifetime [1]-[3]. With an amount of effort putted into the topic, fault diagnosis of rolling bearings has been a one of research hotspots for decades [4]. The most popular way for bearing fault diagnosis is intelligent methods combined shallow machine learning (ML) with artificial feature extraction. An intelligent fault diagnosis algorithm usually contains three main steps: feature extraction, feature selection and feature classification. In general, the original vibration signal collected by the sensors contains a lot of useless noise, so the purpose of feature extraction is to obtain helpful information from original vibration data for fault classification. The widely-used methods of feature extraction include fast Fourier transform (FFT) [5], discrete wavelet transform (DWT) [6], empirical mode decomposition (EMD) [7] and so on. Feature selection is carried out after feature extraction to discard the insensitive and useless features and further reduce the number of effective features. Principle component analysis (PCA) [8], independent component analysis (ICA) [9] and manifold learning [10] are widely applied to select features. The last step, the selected features are inputted to classifiers such as multi-layer perceptron (MLP) [11], hidden Markov model (HMM) [12], support vector machine (SVM) [13] and so on for training to realize fault classification. The conventional methods mentioned above have been widely used to diagnosing faults of rolling bearings. Lin et al. [14] introduced a novel feature extraction approach based on Morlet wavelet and applied it for mechanical fault diagnosis. Cai et al. [15] combined EMD with genetic neural network adaptive boosting (GNN-AdaBoost) to realize fault feature extraction and classification. Kankar et al. [16] extracted features of the bearing original data by statistical methods and then made use of artificial neural network and SVM respectively for feature classification to complete diagnosis.