Introduction
Machines are widely used in the field of aerospace, electric energy, machinery manufacturing and others [1]. In the operation stages of the machines, defects may occur on their key components, leading to the appearance of a fault. Such faults may cause unforeseen breakdown and economic loss. Therefore, intelligent fault diagnosis plays an irreplaceable role in ensuring the reliable operation of the machines [2]. Since deep learning allows deep neural networks to accomplish the tasks of feature extraction and fault classification, the research of the intelligent fault diagnosis of machines using deep learning receives much attention [3], [4].
Nowadays, deep learning based intelligent fault diagnosis methods are widely researched [5]–[8]. For instance, Zhao et al. [9] presented a multi-scale convolutional transfer learning network for intelligent fault diagnosis of rolling bearings under variable working conditions. Tang et al. [10] developed a method using an adaptive learning rate deep belief network combined with Nesterov momentum for rotating machinery fault diagnosis. Fuan et al. [11] presented an adaptive deep convolutional neural network for rolling bearing fault diagnosis. Wang et al. [6] proposed a batch-normalized deep neural networks for feature learning and fault recognition in the intelligent fault diagnosis of machines. Through analysis of these studies, it can be found that although deep learning has been applied in the intelligent fault diagnosis, they have not considered the imbalanced distributions of mechanical health conditions. In real applications, the machines are operating normally in the most time, and the faults seldom happen during the operation. Consequently, the data samples of slight faults of the machines are harder to be collected than those of normal condition, and the data samples of serious faults are harder to be collected that those of slight faults. Thus, the data samples of normal condition are abundant while the data samples of fault conditions are relatively scarce, making the distribution of the mechanical data imbalanced. Such a distribution makes the intelligent fault diagnosis of machines suffer the imbalanced classification problem [12]. As shown in Figure 1, if the dataset is balanced, the samples of normal condition, slight fault and serious fault can be classified correctly. However, if the dataset is imbalanced, the imbalanced problem forces deep learning models to be biased towards the data of the majority health conditions such like the normal condition, and the data of the minority health conditions such like slight fault and serious fault are not adequately learned, leading to the misclassification of the data of these faults.
Simulation of the intelligent fault diagnosis of a machine: (a) a balanced classification case, and (b) an imbalanced classification case.
To deal with the imbalanced classification problem in the intelligent diagnosis of machines, researchers have proposed some methods. Jia et al. [13] presented a framework called deep normalized convolutional neural network for intelligent fault diagnosis, which uses normalized layers and weighted loss to overcome the imbalanced classification problem. Zhou et al. [14] designed a global optimization generative adversarial network to solve the imbalanced classification problem in the intelligent fault diagnosis. Dong et al. [15] proposed deep cost adaptive convolutional network for the imbalanced mechanical data classification. Wang et al. [16] used Wasserstein generative adversarial network to generate simulated signals and train stacked autoencoders to classify the mechanical health conditions under imbalanced data. However, these studies use individual deep network to extract features and recognize the health conditions under imbalanced dataset, which may suffer two weaknesses as follows. (1) The individual deep network may easily over-fit the training data and be biased by random factors, affecting the accuracy and stability of the intelligent fault diagnosis. (2) The hyper-parameters of the deep networks are manually selected using human empirical knowledge.
To overcome the aforementioned weaknesses, this paper takes the advantages of ensemble learning and proposes an ensemble convolutional neural network (EnCNN) for the intelligent fault diagnosis for machines under imbalanced data. In the proposed method, convolutional neural network that uses multiple sensor data as its input is applied as the base classifier in the ensemble learning. Firstly, an under-sampling strategy is used to split the imbalanced dataset of machines into several balanced sub-datasets. Then, the hyper-parameters of base classifiers are initialized by the random selection strategy, and these classifiers are trained using the balanced sub-datasets. Finally, the trained base classifiers are integrated into the EnCNN by weight voting strategy and anomalous classifier selection. The proposed EnCNN is validated by the imbalanced dataset collected from a machinery fault test bench. By comparing with the related methods, the superiority of EnCNN is verified in the intelligent diagnosis of machines under imbalanced data.
The contributions of this paper are summarized as follows.
EnCNN uses the hyper-parameter random selection strategy to set the hyper-parameter for each base classifier automatically and improve the diversity of these classifiers, which is beneficial for an easy and good ensemble in EnCNN.
G-mean score is used as the ensemble weight in the voting strategy and a boxplot-based classifier selection is developed to screen out the anomalous base classifiers, resulting in better classification accuracy in the intelligent fault diagnosis of machines under imbalanced data.
The rest of this paper is organized as follows. Section II details the concept of ensemble learning. In Section III, the proposed EnCNN is described in detail. In IV, the diagnosis cases of a machinery fault test bench with imbalanced data are studied using EnCNN. Finally, conclusions are drawn in Section V.
The Ensemble Learning
This section introduces the concept of ensemble learning. An individual classifier may over-fit training data and affected by random factors, leading to the biased classification accuracy. Thus, ensemble learning is proposed to combine multiple individual classifiers that are called base classifiers to form an ensemble classifier [17]. It reduces the accuracy deviations made by an individual classifier and tends to overcome the overfitting phenomenon and yield better results.
The advantages of ensemble learning can be illustrated using Hoeffding’s inequality. Assuming that the base classifier \begin{equation*} P(h_{t} ({\boldsymbol {x}}^{m})\ne \varphi ({\boldsymbol {x}}^{m}))=\varepsilon\tag{1}\end{equation*}
By combining these base classifiers with the voting method, the ensemble classifier \begin{equation*} H({\boldsymbol {x}}^{m})=\textrm {sgn}\left ({{\sum \limits _{t=1}^{T} {h_{t} ({\boldsymbol {x}}^{m})}} }\right)\tag{2}\end{equation*}
If the base classifiers are independent of each other, the upper limit of the classification error \begin{equation*} P(H({\boldsymbol {x}}^{m})\ne \varphi ({\boldsymbol {x}}^{m}))\le \exp \left ({{-\frac {1}{2}T\left ({{1-2\varepsilon } }\right)^{2}} }\right).\tag{3}\end{equation*}
It can be seen that the upper limit of the ensemble classifier error decreases exponentially as the number of base classifiers increases, and the final error tends to be zero [18]. Therefore, the ensemble classifier will achieve higher accuracy than the base classifier alone.
In ensemble learning, the base classifiers are required to be both independent each other and complement each other, namely the diversity of the base classifiers. Krogh and Vedelsby [19] developed the concept of “error-ambiguity decomposition” to analyze the effect of the diversity, which shows significance for ensemble learning. The error-ambiguity decomposition can be represented as \begin{equation*} E=\overline E -\bar {A}\tag{4}\end{equation*}
The Proposed Method
In this section, we introduce the proposed method ensemble convolutional neural network (EnCNN) for the intelligent fault diagnosis of machines under imbalanced data. In the proposed EnCNN, a convolutional neural network is used as the base classifier, as shown in Figure 2(a). The flowchart of EnCNN is displayed in Figure 2(b). The imbalance dataset is firstly split into balanced training subsets through under-sampling strategy, and each subset is then used to train a base classifier. Then the weight coefficients of each trained base classifier are calculated by G-mean score and anomalous base classifiers are screened using classifier selection. Finally, the base classifiers are integrated into EnCNN through weighted voting strategy.
(a) The convolutional neural network used as the base classifier, and (b) the flowchart of the proposed EnCNN.
A. The Base Classifier
Since CNNs are able to deal with the shift variant signals like vibration signals, we use it used as the base classifier in this paper. As shown in Figure 2(a), CNNs have four kinds of layers: input layer, convolutional layer, pooling layer and fully connected layer. The input layer is used for the input of multi-sensor signals, the convolutional layer and pooling layer are combined as a building block for the feature learning, and fully connected layer is used for the fault classification [21].
Given a training set \begin{equation*} {\boldsymbol {u}}_{j}^{m,l} =\sigma \left ({{\sum \limits _{d} {{\boldsymbol {k}}_{j,d}^{l} \ast {\boldsymbol {x}}_{d}^{m,l-1} +b_{j}^{l}}} }\right)\tag{5}\end{equation*}
After obtaining \begin{equation*} {\boldsymbol {x}}_{j}^{m, l} =\textrm {max}({\boldsymbol {u}}_{j}^{m, l},s)\tag{6}\end{equation*}
In the fault classification stage, fully connected layers are combined as the classification block to recognize the faults of the machine. If \begin{equation*} \ell =-\frac {1}{M}\sum \limits _{m=1}^{M} {\sum \limits _{c=1}^{C} {1\left \{{ {y^{m}=c} }\right \}\log \left ({{p_{c}^{m}} }\right)}}\tag{7}\end{equation*}
In (7), the Softmax loss function measures the classification error of CNN during training, and this error is used to adjust the CNN parameters. However, its loss function assumes that the contribution of each sample to the classification error is the same. When the numbers of different fault samples are different, the contribution of different fault to the classification error is different, prompting CNN to correctly classify the majority class and misclassify the minority class. Thus, traditional CNN cannot handle the imbalanced classification in intelligent fault diagnosis of machines.
B. The Balanced Subsets for Training
To deal with the training of CNN using imbalanced data, we first use under-sampling strategy [22] to randomly obtain multiple balanced subsets of samples and then these subsets are applied to train the base classifiers. This random data selection strategy could improve the data diversity in the training process of EnCNN and is described as follows.
For the training set \begin{equation*} n_{c} =\sum \limits _{m=1}^{M} {1\left \{{{y^{m}=c} }\right \}}.\tag{8}\end{equation*}
The number of samples for each health condition \begin{equation*} n_{\min } =\textrm {min}\left ({{\{n_{c} \}_{c=1}^{C}} }\right)\tag{9}\end{equation*}
For each health condition, \begin{equation*} S_{t} =\left \{{{{\boldsymbol {x}}^{m_{t}}, y^{m_{t}}} }\right \}_{m_{t} =1}^{M_{t}}\tag{10}\end{equation*}
The random data selection strategy has two advantages. (a) It reduces the sample number when balancing the samples for each health condition, and it could shorten the training time of base classifiers. (b) The strategy forms several different subsets for the training of base classifiers and increases the differences between the base classifiers during training, which improves the diversity of the base classifiers in the ensemble learning.
C. Random Hyper-Parameter Selection and Training of Base Classifiers
When training a base classifier, we should determine hyper-parameters for the classifier and these parameters include: the number of feature learning block
Random selection strategy covers a large amount of information and is computationally simple and efficient compared to traditional grid search. The probability of random selection obtaining the optimal hyper-parameter is \begin{equation*} P_{q} =1-\left ({{1-q \mathord {\left /{ {\vphantom {q Q}} }\right. } Q} }\right)^{\Gamma }\tag{11}\end{equation*}
\begin{equation*} \textrm {min}-\frac {1}{M}\sum \limits _{m_{t} =1}^{M_{t}} {\sum \limits _{c=1}^{C} {1\left \{{{y^{m_{t}}=c} }\right \}\log \left ({{p_{c}^{m_{t}}} }\right)}}.\tag{12}\end{equation*}
The advantages of random hyper-parameter selection are as follows. (a) The manual hyper-parameter selection of a base classifier is avoided. In ensemble learning, the probability of obtaining the optimal hyper-parameters increases with the number of base classifiers gradually. (b) The independence between the random selected hyper-parameters contributes to the differences between the base classifiers, which enhances the diversity of the base classifiers.
D. Base Classifiers Selection and Ensemble
After each base classifier has been trained, these classifiers are combined to the ensemble classifier. In EnCNN, the ensemble of individual base classifiers based on the weighted voting strategy. We calculate the ensemble weights for each base classifier by using G-mean score. Since only a portion of the samples are under-sampled from the original training dataset to train a base classifier, the performance of the classifier can be tested with the original training dataset. G-mean is used to get the test performance of a base classifier and we use the performance score as the ensemble weight of this classifier.
G-mean [24] takes into account imbalanced distributions of the samples and has been a common metric to measure the classification performance under imbalanced data. It considers both the recall and specificity of the classification results, where recall is a function of true positives (TP) and false negatives (FN) and specificity is a function of true negatives (TN) and false positives (FP). \begin{equation*} recall=\frac {TP}{TP+FN},\quad specificity=\frac {TN}{TN+FP}\tag{13}\end{equation*}
\begin{equation*} g=\sqrt {recall\times specificity}.\tag{14}\end{equation*}
G-mean score reaches its best value at 1 and worst score at 0 and we use this score for the ensemble weight of the base classifiers. Due to random factors, however, a base classifier may be trained abnormally and may result in a low classification performance. To keep the anomalous base classifiers from affecting the classification performance of EnCNN, these classifiers are detected by classifier selection and are deleted before the ensemble [25]. Here, we use boxplot-based classifier selection method to find the anomalous base classifiers. Compared with other methods, boxplot does not need to assume that the data obey a particular distribution and can provide an objective criterion for identifying outliers of the data [26]. Thus, we apply it for the classifier selection of the base classifiers. The ensemble weights of anomalous base classifiers will be set to zero and the ensemble weights of other base classifiers will be reserved, as shown in (15).\begin{align*} {g}'_{t} =\begin{cases} g_{t}, &\textrm {if}~g_{t} \ge Q_{1} -1.5I \\ 0,& \textrm {if}~g_{t} < Q_{1} -1.5I \\ \end{cases}\tag{15}\end{align*}
\begin{equation*} w_{t} ={g'_{t}} \mathord {\left /{ {\vphantom {{g'_{t}} {\sum \limits _{t=1}^{T} {g'_{t}}}}} }\right. } {\sum \limits _{t=1}^{T} {g'_{t}}}\tag{16}\end{equation*}
Finally, the selected base classifiers and their ensemble weights are combined to form EnCNN \begin{equation*} H({\boldsymbol {x}}^{m})=\sum \limits _{t=1}^{T} {w_{t} \cdot h_{t} ({\boldsymbol {x}}^{m})}\tag{17}\end{equation*}
E. The Measurement of the Performance of EnCNN
In the testing stage of EnCNN, a balanced accuracy function is used to measure the performance of EnCNN, which is used to avoid the inflated performance estimating on imbalanced data. This function is the macro-average of recall scores per class. For the test sample \begin{equation*} \hat {w}^{m}=\frac {w^{m}}{\sum \nolimits _{n=1}^{M} {1\left \{{{y^{n}=y^{m}} }\right \}w^{n}}}.\tag{18}\end{equation*}
Given the predicted label \begin{equation*} \text {Acc}\left ({{y,\hat {y},w} }\right)=\frac {}{\sum {\hat {w}^{m}} }\sum \limits _{m=1}^{M} {1\left \{{{\hat {y}^{m}=y^{m}} }\right \}} \hat {w}^{m}.\tag{19}\end{equation*}
This balanced accuracy is used to test the classification performance of EnCNN in intelligent diagnosis of machines.
Intelligent Fault Diagnosis of Machines Using Imbalanced Data
In this section, we describe the experiment data used in this paper and use the data to verify the proposed method.
A. Data Description
The imbalanced dataset used in this paper is composed of the multivariate time-series acquired by a SpectraQuest’s machinery fault test bench [27], [28]. The test bench emulates the dynamic of motors with two shaft-supporting bearings and allows the study of multiple faults, as shown in Figure 3. Accelerometers were mounted on the overhang and underhand house, respectively. Axial, radial and tangential direction data were collected to represent the health conditions of the test bench. Thus, a sample contained the vibration signals from six channels. As displayed in Table 1, a variety of faults were imposed on the test bench, involving normal operation, horizontal misalignment and vertical misalignment. The misalignment of fault was induced into the test beach by shifting motor shaft horizontally or vertically with different degrees. In the experiment, different amount samples were collected for different faults and these samples composed the imbalanced dataset used in this paper. The total number of the samples are 5400.
The proposed ECNN method is used to identify the health condition of the test bench under the imbalanced dataset. 50% of the samples from each health condition are randomly selected to compose the training dataset and the remaining samples compose the testing dataset.
B. Classification Results
We use the training dataset to train the proposed EnCNN and use the testing dataset to verify classification performance of EnCNN under imbalanced data. Since the hyper-parameters of the base classifiers of ECNN are randomly selected, there is only one hyper-parameter should be preset manually, i.e. the ensemble number. This number indicates the number of base classifiers used to be integrated in the proposed EnCNN. The effects of this parameter on the classification accuracy and the computing time are analyzed, where the computation platform is a computer with Intel I5 CPU, GTX 1060 GPU and 8G RAM. The results are shown in Figure 4. It can be seen that the training accuracy and testing accuracy of EnCNN increase with the ensemble number increasing. When the ensemble number increases from 1 to 12, the training accuracy and testing accuracy increase rapidly and their standard deviations decrease rapidly. The training accuracy increases from 78.49% to 95.35% and the corresponding standard deviations decrease from 15.01% to 1.08%. The testing accuracy increases from 75.69% to 91.17% and the corresponding standard deviations decrease from 15.76% to 1.78%. When the ensemble number increases from 16 to 24, the training accuracy and testing accuracy become stable. The training accuracy ranges from 95.95% to 96.11% and the corresponding standard deviations range from 0.74% to 0.89%. The testing accuracy ranges from 92.05% to 92.59% and the corresponding standard deviations range from 1.15% to 1.17%. In addition, the computing time ranges from 8.4 to 175.7 s. Thus, the larger the ensemble number is, the more time the method spends. The ensemble number selection of EnCNN is actually the trade-off between the accuracy and training time. Traditionally, the larger the ensemble number of base classifiers is, the better the diversity of base classifiers is and the higher the accuracy of EnCNN achieves. However, if the ensemble number of base classifiers is too large, it not only increases the training time of ECNN but also increases the possibility of redundant base classifiers. Therefore, compared with the results in Figure 4, we choose 16 as ensemble number.
In order to fully inspect the health condition of machines, a sample contains six channel vibration signals. To verify the necessity of multiple channel data for the intelligent fault diagnosis of the machines, the proposed EnCNN are used to compare with the diagnosis methods based on single channel data. The classification results are shown in Table 2. The training accuracy using Channels 1 to 6 ranges from 51.69% to 80.24% and their standard deviations range from 3.12% to 1.57%. The test accuracy ranges from 36.03% to 63.48% and their standard deviations range from 3.72% to 6.57%. The results indicate that the single data cannot fully represent the health condition of the machines and effectively train the ECNN, resulting in the overfitting of ECNN during training. In contrast, the proposed method makes fully use of all the channel data to recognize the health conditions of machines under imbalanced data and finally obtain the training accuracy of 96.99% and the test accuracy of 91.58%. It illustrates the advantages of using the multiple channel data.
To illustrate the advantages of classifier selection used in EnCNN, the classification results are obtained by EnCNN using classifier selection and not using classifier selection, as shown in Figure 5. The G-mean scores of each base classifier of EnCNN is plotted in Figure 5(a). It can be seen that scores of these base classifiers range from 18.55% to 89.91%, where the scores of the 6th and 10th based classifiers are only 42.99% and 18.55%, respectively. From the boxplot in Figure 5(b), the scores of the two based classifiers are outlies and should be removed in the ensemble learning in EnCNN. Figure 5(c) shows the ensemble weights of the EnCNN without classifier selection and the testing accuracy of such network is the 92.98% in Figure 5(d). Figure 5(e) shows the ensemble weights of the proposed EnCNN using classifier selection and the ensemble weights of the 6th and 10th based classifiers are set to zero. The testing accuracy of EnCNN finally is improved to 93.34% as shown in Figure 5(f). Therefore, these results verify the effectiveness of the anomalous classifier selection used in EnCNN.
The classification results of EnCNN: (a) G-mean scores of each base classifier, (b) boxplot of G-mean scores, (c) ensemble weights of base classifiers not using classifier selection, (d) testing accuracy not using classifier selection, (e) ensemble weights of base classifiers using classifier selection, and (d) testing accuracy using classifier selection.
In EnCNN, a weighted voting strategy based on G-mean score is used for the ensemble of base classifiers. To illustrate the effectiveness of this strategy, we compare EnCNN using weighted voting strategy based on G-mean score and EnCNN using traditional weighted voting strategy based on accuracy scores. The confusion matrices of the testing accuracy are shown in Figure 6. It is seen that the accuracy of each health condition using EnCNN based traditional weighted voting strategy ranges from 76.71% to 98.2% in Figure 6(a). In Figure 6(b), the accuracy of vertical misalignment is improved by using the proposed EnCNN from 76.71%, 84.97% and 89.67% to 82.81%, 87.21% and 93.03%. These results verify that the used weighted voting strategy in the proposed EnCNN is superior to the traditional weighted voting strategy in the intelligent diagnosis of machines under imbalanced data.
The confusion matrix: (a) EnCNN using traditional weighted voting strategy based on accuracy scores, and (b) the proposed EnCNN using weighted voting strategy based on G-mean score.
C. Comparisons
To verify the effectiveness of the proposed method, some famous methods are used to classify the health conditions of the imbalanced data. These methods involve CNN, RUSboost method, SMOTE method and random forest. The input of CNN is the vibration signals and the input of the other three methods is the manual features used in [29]. RUSBoost method [30] first uses under-sampling strategy to balance the sample number of each health condition and then uses AdaBoost classifier to recognize these health conditions of machines. SMOTEBoost method [31] uses an oversampling strategy to create synthetic data samples and make the samples of each health condition balanced, and then applied ensemble classifiers for intelligent fault diagnosis. Random forest [32] selects a random sample with replacement of the training set to train multiple decision tree models, and combines the models as ensemble classifiers using with bagging strategy. The classification results of the above methods are shown in Table 3.
It is seen that the training accuracy of CNN, SMOTE method and random forest ranges from 99.73% to 100% and the corresponding standard deviations range from 0 to 0.24%. Although the training accuracy of these methods is above 99%, the test accuracy ranges from 81.77% to 88.96%, indicating that a severe overfitting occurs in the training stages. The training accuracy and testing accuracy of RUSboost method are 73.16% and 71.03%, respectively. Although overfitting is slight in its training stage, its accuracy is the lowest in the compared methods. In the proposed EnCNN, CNN is used as the base classifier, and under-sampling strategy and random hyper-parameter selection is applied to increase the diversity of the base classifiers. Thus, overfitting in the training stage of EnCNN is avoided to some extent, and its testing accuracy is higher than the other compared methods. These results verify the effectiveness of the proposed method.
Conclusions
This paper proposes EnCNN for the intelligent fault diagnosis of machines under imbalanced data. In EnCNN, a convolutional neural network with multi-sensor signal input is used as the base classifier. The mechanical imbalance dataset is split into balanced training subsets through under-sampling strategy, and each subset is then used to train a base classifier. Then the weight coefficients of each trained base classifier are calculated by G-mean score and anomalous base classifiers are screened using classifier selection so as to form EnCNN through weighted voting strategy. Finally, EnCNN is applied for the classification of mechanical health conditions. An imbalanced dataset collected from the machinery fault test bench is used for validating the proposed EnCNN. The classification results show that EnCNN could select hyper-parameters of a deep network automatically and has a stable diagnosis accuracy under the imbalanced data. By comparing the related methods for imbalanced classification, the effectiveness of the proposed method is verified. Moreover, the proposed method needs to preset only one hyper-parameter, i.e. the number of base classifiers. Generally, the larger the number of base classifiers is, the better the diversity of base classifiers and the better the generalization ability of EnCNN are, but the more computing time EnCNN will spend.
Although the effectiveness of the proposed EnCNN is verified, it cannot deal with the situation that new fault may appear after training stage, namely unseen fault classification. The authors focus on this problem in the future work.