Journals & Magazines >IEEE Access >Volume: 8

Bearing Fault Diagnosis Based on Natural Adaptive Moment Estimation Algorithm and Improved Octave Convolution

The network structure is composed of three-layer dilated gate convolution layer and two dense layers. The network convergence speed is accelerated by improving Adam algor...

Abstract:

Fault diagnosis of rolling bearing has been the focus of research. Bearing signals are often accompanied by similar information, resulting in redundancy between data. Mor...Show More

Metadata

Abstract:

Fault diagnosis of rolling bearing has been the focus of research. Bearing signals are often accompanied by similar information, resulting in redundancy between data. Moreover, rolling bearing is often used in situations with large background noise, so extracting the characteristic value of rolling bearing signal and removing noise from the signal are of great significance. This paper presents a fault diagnosis model combining NAdam(Natural Adaptive Moment Estimation) algorithm and improved octave convolution. First, natural exponential decay function is proposed to replace the exponential decay function for parameter updating of Adam(Adaptive Moment Estimation). Compared with the exponential decay function, the natural exponential decay function can accelerate the convergence rate of the model. The internal structure in octave convolution is then improved. The improved structure can improve feature extraction and eliminate data redundancy. Finally, the dilated gate convolution layer is used to filter and classify the data. According to the simulation test of the case western reserve university data set and laboratory power equipment data set, the accuracy rate can reach more than 98%. Experiments with variable load and signal noise ratio are carried out to verify the noise resistance and generalization performance of the proposed method.

The network structure is composed of three-layer dilated gate convolution layer and two dense layers. The network convergence speed is accelerated by improving Adam algor...

Published in: IEEE Access ( Volume: 8)

Page(s): 196790 - 196803

Date of Publication: 27 October 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.3034281

Funding Agency:

Citations are not available for this document.

Contents

SECTION I.

Introduction

Rolling bearing is widely used as an important component of mechanical equipment in the industrial field. Rolling bearing needs to bear the weight of the rotating machine and ensure the normal operation of the bearing and the machine [1]. However, the bearing must be accompanied by vibration during operation. When wear, corrosion, and fatigue occur, the vibration of the bearing will be exacerbated, thereby reducing its working efficiency and even causing casualties [2]–[4].

The working environment of a bearing is complicated and often accompanied by background noise [5]. Noise will also have a great impact on bearing signal analysis. Therefore, in the fault diagnosis of rolling bearings, effectively extracting the characteristic information of the bearing signal and filtering the noise are important tasks to be studied.

In traditional learning methods, artificial experience is often used to extract features, and classifiers are used to classify faults. Artificial experience tends to reduce the accuracy of feature recognition but has great limitations. Wavelet transform [6] and Fourier transform [7] are used to replace artificial experience in feature extraction. Finally, SVM [8], Bayes [9], and other classifiers are used for classification learning. Wavelet transform can observe the local characteristics of the signal and has a certain filtering effect on noise in the signal. However, wavelet transform will produce redundancy. In the Fourier transform, although the weighted average of the signal in the time domain is calculated [10], it cannot provide sufficient time domain information. When dealing with abrupt signals, the sensitivity of Fourier transform is poor. Wu et al. [11] proposed a method of combining Fourier transform and wavelet transform to filter noise in the signal, but the accuracy of this method is not very high. To have a network structure with high precision and good generalization performance, scholars need to extract features and have a high-performance classifier. Deng et al. [12] proposed a motor bearing fault diagnosis method based on integrating empirical wavelet transform, fuzzy entropy, and SVM and achieved high accuracy. Mbo’O and Hameyer [13] used linear discriminant analysis to evaluate features and employed Bayesian classifiers to perform fault diagnosis. The proposed method can distinguish damaged bearings from normal bearings. Li et al. [14] proposed a feature extraction and classification method combining wavelet scattering transform and twin support vector machines by using scattering transform to perform time-domain analysis and SVM to classify training data.

With the continuous development of artificial intelligence, deep learning is gradually applied to text analysis [15], speech synthesis [16], and image classification [17]. In contrast to traditional methods, deep learning integrates feature extraction and classification models. The CNN network has a significant effect on feature extraction and has the function of classifying networks. Sadoughi and Hu [18] introduced physical knowledge into the neural network by encoding physical information about the bearing and its fault characteristics to the network for analysis and testing. Wen et al. [19] proposed a method for automatically extracting features, performing dimensional transformation on the data on the LeNet-5 model structure, and extracting feature values in the data. Simple network structure can avoid the phenomenon of overfitting but also limits the improvement of the network accuracy. With the introduction of deep network structure, an increasing number of deep neural networks are used in fault diagnosis. Zhou et al. [20] proposed a bearing fault diagnosis model based on improved stacked recurrent neural network to solve the problem of gradient disappearance through a gating unit. Zhuang et al. [21] proposed a network model consisting of dilated convolution, gate convolution, and residual network. The dilated convolution increases the local receptive field, thereby increasing the receiving domain of the convolution kernel. Fan et al. [22] analyzed the structure of octave convolution and proposed the same multi-frequency method for octave transposed convolution.

Deep learning is widely used in various fields. However, the bearing working environment is more complex. Background noise has a certain effect on the training of network model. At the same time, when training the network, factors such as parameter setting and optimization algorithm selection will affect the training accuracy of the network model.

When training deep learning networks, many optimization algorithms are commonly used, such as learning rate decay method [23] and gradient optimization method [24]. Bello et al. [25] proposed to add noise in linear cosine attenuation to increase the randomness and possibility of the process to a certain extent. An et al. [26] proposed an exponentially decayed sine wave learning rate to learn parameters in the network. A small number of iterations are required to achieve a high accuracy rate and speed up the training. In learning rate decay method, selecting an appropriate learning rate is critical. If the learning rate is too large, then the network will not converge. If the learning rate is too small, then the network will convergence too slowly. In gradient optimization, batch gradient descent method [27] and momentum method [28] are often used for network optimization. Li et al. [29] proposed a small batch gradient separation algorithm, which solves the problem of minimizing data reconstruction errors. Tang et al. [30] used Nesterov momentum instead of traditional momentum and then combined it with a deep belief network. This method can speed up training and improve accuracy. In recent years, optimization algorithms have been improved to better fit the network for data. However, noise interference and information redundancy often occur in data. To obtain an excellent deep learning network model, scholars must use not only an appropriate algorithm optimized network parameters but also a better network structure.

In view of the problems in the above model structure, NAdam (Natural Adaptive Moment Estimation) algorithm is proposed and combined with an improved octave convolution bearing fault diagnosis method. The proposed methods contribute the following:

The optimization algorithm Adam uses exponential decay moving average method to update the gradient value. The natural exponential decay function is simpler than the exponential decay function and converges faster. Therefore, NAdam algorithm is proposed to optimize network parameters, reduce running memory, and increase calculation time.
Octave convolution eliminated data redundancy, and it uses down-sampling, up-sampling, and convolution operations for feature extraction and dimensionality reduction. However, some data will be lost during the data processing of down-sampling and up-sampling. Dilated convolution is used instead of up-sampling, down-sampling, and convolution operations to prevent data loss.
The working environment of rolling bearings is complex, and the background noise is large, which will affect feature extraction. Gate convolution is added to the dilated convolutional layer of the network structure to form a dilated gate convolutional layer and eliminate noise interference.

The remaining parts of this paper are organized as follows. Section 2 explains the NAdam algorithm in detail. Section 3 introduces the structure and improvement of octave convolution. Section 4 introduces the network model mentioned in this article and provides further explanation. In Section 5, two different data sets are used to verify the noise resistance and generalization of the proposed method, and visualization operation is performed. Section 6 presents the conclusion and next work arrangement.

SECTION II.

NAdam Algorithm

A. Adam Algorithm

In building a deep learning network model for bearing fault diagnosis, selecting an appropriate learning rate is important when training model parameters because the size of the learning rate directly affects the convergence rate of the network.

Adam (Adaptive Moment Estimation) algorithm [31] is an optimization algorithm proposed by Diederik and Jimmy in 2015. The Adam algorithm is a very efficient random algorithm that only requires a small amount of memory and saves a lot of it for calculation of other parameters. The algorithm adaptively learns parameter learning rate and updates parameters in different aspects from the $1^{st}$ and $2^{th}$ moment vector of the gradient. The Adam algorithm, a combination of Momentum and RMSprop (Root Mean Square prop) [32], can use momentum as a parameter update direction and can adaptively adjust the learning rate.

In the Adam algorithm, first calculate the gradient value of iteration t times. Randomly select $m$ samples from the training set to form a small batch $\left \{{x^{\left ({1 }\right)},x^{\left ({2 }\right)},\cdots x^{\left ({m }\right)} }\right \}$ , $y^{\left ({i }\right)}$ is the true value of $x^{\left ({i }\right)}$ , The gradient value $g_{t}$ is:

$\begin{equation*} g_{t}=\frac {1}{m}\nabla \theta _{t-1}\sum \nolimits _{i} {L\left ({f\left ({x^{\left ({i }\right)};\theta _{t-1} }\right),y^{\left ({i }\right)} }\right)}\tag{1}\end{equation*}$ View Source

where

$\nabla \theta _{t-1}$

represents the partial derivative of

$\theta _{t-1}$

$f\left ({x^{\left ({i }\right)};\theta _{t-1} }\right)$

represents a random scalar function at time t-1. Randomness may originate from the evaluation of random samples of data points or may be caused by function noise.

On the one hand, the exponentially weighted average weight $M_{t}$ ( $1^{st}$ moment vector) of the gradient $g_{t}$ is calculated. On the other hand, the exponentially weighted average $G_{t}$ ( $2^{th}$ moment vector) of the gradient square $g_{t}^{2}$ is calculated:

$\begin{align*} M_{t}=&\beta _{1}M_{t-1}+\left ({1-\beta _{1} }\right)g_{t} \\ G_{t}=&\beta _{2}G_{t-1}+\left ({1-\beta _{2} }\right)g_{t}\odot g_{t}\tag{2}\end{align*}$ View Source

$\beta _{1}$ and $\beta _{2}$ are the attenuation rates of two moving averages, namely, $\beta _{1}=0.9$ , $\beta _{2}=0.99$ .

The initial values of $M_{t}$ and $G_{t}$ are usually set to $M_{0}=0$ and $G_{0}=0$ , leading to the deviation of $M_{t}$ and $G_{t}$ toward 0 in the early stage of training. In this regard, the deviation of the $1^{st}$ and $2^{th}$ moment vector must be corrected:

$\begin{align*} \widehat {M_{t}}=&M_{t}/\left ({1-\beta _{1}^{t} }\right) \\ \widehat {G_{t}}=&G_{t}/\left ({1-\beta _{1}^{t} }\right)\tag{3}\end{align*}$ View Source

Finally, the parameter update value is:

$\begin{align*} \Delta \theta _{t}=&-\frac {\gamma }{\sqrt {\widehat {G_{t}}+\delta }}\widehat {M_{t}} \\ \theta _{t}=&\theta _{t-1}+\Delta \theta _{t}\tag{4}\end{align*}$ View Source

Among them, learning rate is $\gamma =0.001$ . The denominator is added with constant ${\delta =10}^{-6}$ to prevent having a 0 value.

B. Improved Adam Algorithm

In the calculation process of the Adam algorithm, exponential decay [33] calculation is involved. Exponential decay is commonly used to adjust the learning rate in neural networks, where the learning rate determines the convergence rate of the optimal solution. In calculating the exponential decay function, a large learning rate is first set to obtain optimal solution in the network. As the number of iterations increases, the learning rate will gradually decrease, resulting in increased stability of the model during iteration and training of the optimal solution. The specific exponential decay function is as follows:

$\begin{equation*} \alpha _{t}=\alpha _{0}\beta ^{t}\tag{5}\end{equation*}$ View Source

Among them, t is the iterations, $\alpha _{0}$ is the initial learning rate, $\alpha _{t}$ is the learning rate of iteration t, $\beta$ is the decay rate is $\beta =0.96$ .

Most popular methods for adaptively adjusting the learning rate, such as RMSprop and AdaDelta algorithms, all use exponential decay to obtain learning rate. In addition to the exponential decay function having an optimization effect, the adjustment of the learning rate by the natural exponential decay function has a very good effect on practice.

The natural exponential decay [34] function is faster than the exponential decay function to train the network. In contrast to the exponential decay, the natural exponential decay is based on $e$ , which makes the calculation speed higher and allows the network to reach convergence faster. The calculation formula of the natural exponential decay function is as follows:

$\begin{equation*} \alpha _{t}=\alpha _{0}exp\left ({-\beta \ast t }\right)\tag{6}\end{equation*}$ View Source

$\beta$ is the decay rate is $\beta =0.96$ , t is the iterations.

The learning rate of the exponential decay function and the natural exponential decay function is visualized through a simple experiment to visually compare their convergence rate.

As shown in Fig. 1, the learning rate of the natural exponential decay function converges faster than the exponential decay function, thereby speeding up the convergence of the model during the network training process. In the Adam algorithm, the exponential decay moving average in the deviation correction calculation of the first and second moments is changed to the natural exponential decay moving average, and a new optimization algorithm, namely, NAdam is proposed. This algorithm saves parameter calculations and improves the convergence speed of the network. In the calculation process, the parameter update values of $\widehat {M_{t}}$ and $\widehat {G_{t}}$ are smaller than those of the exponential function but are more accurate, thereby effectively avoiding overfitting. The specific algorithm flow is as follows:

FIGURE 1.

Comparison of attenuation function.

MIT Libraries

MIT Libraries

Bearing Fault Diagnosis Based on Natural Adaptive Moment Estimation Algorithm and Improved Octave Convolution

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

NAdam Algorithm

A. Adam Algorithm

B. Improved Adam Algorithm

Algorithm 1 NAdam. Good Default Settings for the Tested Machine Learning Problems Are \alpha=0.001,\beta_{1}=0.9,\beta_{2}=0.99\alpha=0.001,\beta_{1}=0.9,\beta_{2}=0.99 and \delta={10}^{-6}\delta={10}^{-6}

Improved Octave Convolution

Network Model

A. Dilated Gate Convolution

B. Data Enhancement

Experimental Analysis

A. Data Introduction

1) Case Western Resever Unversity Bearing Data

2) Data of the Driveline Diagnostic Simulator

B. Experimental Verification

1) Model Introduction

2) CWRU Data Set

3) DDS Data Verification

4) Variable Load Experiment

5) Experiment With Different SNR

Conclusion

Cites in Papers - IEEE (1) | Other Publishers (3)

Cites in Papers - IEEE (1)

Cites in Papers - Other Publishers (3)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Algorithm 1 NAdam. Good Default Settings for the Tested Machine Learning Problems Are $\alpha=0.001,\beta_{1}=0.9,\beta_{2}=0.99$ and $\delta={10}^{-6}$

Cites in Papers - |