Journals & Magazines >IEEE Access >Volume: 8

Bearing Intelligent Fault Diagnosis in the Industrial Internet of Things Context: A Lightweight Convolutional Neural Network

The figure demonstrates the process as to how LCNN model recognizes the bearing fault where a novel decomposed Hierarchical Search Space is adopted for model optimization...

Abstract:

The advancement of Industry 4.0 and Industrial Internet of Things (IIoT) has laid more emphasis on reducing the parameter amount and storage space of the model in additio...Show More

Metadata

Abstract:

The advancement of Industry 4.0 and Industrial Internet of Things (IIoT) has laid more emphasis on reducing the parameter amount and storage space of the model in addition to the automatic and accurate fault diagnosis. In this case, this paper proposes a lightweight convolutional neural network (LCNN) method for intelligent fault diagnosis of bearing, which can largely satisfy the need of less parameter amount and storage space as well as high accuracy. First, depthiwise separable convolution is adopted, and a LCNN structure is constructed through an inverse residual structure and a linear bottleneck layer operation. Secondly, a novel decomposed Hierarchical Search Space is introduced to automatically explore the optimal LCNN for bearing fault diagnosis in the context of the IIoT. In the meantime, the real-time monitoring and fault diagnosis of the model are also deployed. In order to verify the validity of the designed model, Case Western Reserve University Bearing fault dataset and MFPT bearing fault dataset are adopted. The results demonstrate the great advantages of the model. The LCNN model can automatically learn and select the appropriate features, highly improving the fault diagnosis accuracy. Meanwhile, the computational and storage costs of the model are largely reduced, which contributes to its being applied to the mechanical system in the IIoT context.

The figure demonstrates the process as to how LCNN model recognizes the bearing fault where a novel decomposed Hierarchical Search Space is adopted for model optimization...

Published in: IEEE Access ( Volume: 8)

Page(s): 87329 - 87340

Date of Publication: 07 May 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.2993010

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

As the core of the whole industrial system, the role rotating machines play is of great importance in modern industry. Moreover, the safe and reliable operation of bearings to a large degree determines the operation of the mechanical systems [1], [2]. However, bearings failures are likely to be occurr on high-load and strong-impact conditions, leading to the aging of the entire machine and even serious performance or safety loss. With the development of Industry 4.0 and Industrial Internet of Things (IIoT) technology, continuous monitoring and real-time fault diagnosis is indispensable to detecting faults before damage as well as providing important support for maintenance [3]–[6].

Traditional mechanical fault diagnosis mainly includes three steps: i) to construct perfect characteristic parameters that can represent bearing faults by using advanced signal processing methods, such as wavelet decomposition, wavelet packet decomposition, empirical mode decomposition, variational mode decomposition, Spectral kurtosis as well as the improved algorithms of the above methods [7]–[9]; ii) to select key feature parameters via dimensionality reduction methods such as principal component analysis, and auto-encoder [10]; iii) to realize fault classification through pattern recognition methods, including support vector machine (SVM), decision tree, random forest and artificial neural network methods [11], [12]. However, traditional machine learning methods largely rely on signal processing techniques and diagnostic experience, which makes it difficult to deal with classification or regression problems in complex situations.

With the rapid development of advanced measurement technology as well as the advancement of Industry 4.0 and the IIoT technology, numerous data are collected [13], [14]. Due to high computing complexity, however, traditional machine learning methods fail to establish decision models for these data. Therefore, deep learning has emerged and has been adopted for fault diagnosis. C. L. et al. used stacked de-noising auto-encoder to identify the signal status with environmental noise and fluctuations in working conditions [15]. The proposed method proved to achieve high diagnosis accuracy and strong robustness. As for Y. L. et al, they proposed a planetary gear fault diagnosis method on the basis of power-spectral-entropy-based variational mode decomposition and deep neural network to achieve almost perfect fault diagnosis effect [16].

As one of key branches of deep learning, convolutional neural networks (CNN), have emerged with excellent performance in bearing fault diagnosis in massive data context. S. G. et al. realized accurate, robust and general fault diagnosis of rotating machines through continuous wavelet transform and CNN method [17]. In the experiment of S. S. et al, they proposed a multi-signal fault diagnosis method via deep convolutional neural networks (DCNN) which could learn from multiple sensor signals to achieve robustness and ultimately accurate induction motor fault identification [18]. W. Y. et al. adopted broad convolutional neural network to improve the model’s diagnostic performance and incremental learning capabilities by adding newly generated additional features for self-update so as to include new abnormal samples and fault classes [19].

The aforementioned CNN methods, however, fail to take account of the computational and storage costs of the target models. Simultaneously, the CNN structures largely depend on expert experience to obtain the optimal diagnosis model. In addition, the models may be difficult for training because of gradient vanishing with the increase of the model depth due to limited samples. Recently, some fault diagnosis methods for high-speed devices that demand model deployment and real-time diagnosis have been widely used, laying the foundation for real-time and rapid diagnosis in the context of the IIoT [20], [21]. Moreover, some high-precision diagnostic methods for limited dataset have also been continuously studied [22], [23]. With aim to improve the model accuracy and effectiveness for bearing fault diagnosis as well as reduce its computation and storage costs, a lightweight convolutional neural network (LCNN) for intelligent diagnosis of bearing faults is proposed in this paper. The key of this study is how to effectively reduce the model parameters and storage space, and how to construct an optimal diagnostic model to achieve high accuracy. As for the proposed LCNN method, a novel decomposed Hierarchical Search Space is introduced to explore the optimal network for bearing fault diagnosis. By introducing deep separable convolution in place of traditional convolution, and introducing inverse residual structure and linear bottleneck layer, the computational and storage costs of the model are reduced while its accuracy is improved. The convolution operation is performed on the image data of the fault samples to capture the non-linear structure and fault trend, and the physical interpretation of the extracted features as well as the model recognition results is provided through visualization. The main contributions of this paper are as follows:

A LCNN model is constructed via lightweight convolution blocks rather than the traditional convolution operations, which highly improves the fault diagnosis accuracy, and largely reduces the calculation amount and model storage. Moreover, with significantly high accuracy, it effectively solves the problem of serious overfitting and vanishing gradient caused by the model deepening in limited samples.
A novel decomposed Hierarchical Search Space is adopted for model optimization to balance accuracy and parameters. Through decomposed Hierarchical Search Space for automatic search on the dataset for the optimal model for bearing fault diagnosis, the constructed LCNN greatly reduces the dependence on experience in the model construction process.
Since the CNNs are “black box”, Tensorborad is used to visualize the feature extraction results of each convolutional layer and t-distribution stochastic neighbor embedding (t-SNE) method is applied to visualize the learned features in hidden fully connected layer. In this sense, the visualization of the entire model is achieved.

SECTION II.

Lightweight Convolutional Neural Network

A. Convolutional Neural Network

As one of important branches of deep learning, CNNs excel in the field of pattern recognition as a result of their excellent feature capture capabilities [24]. A basic CNN includes an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. As a matter of fact, its essence is to construct multiple filters to convolve and pool the input data layer by layer, and extract their features layer by layer [25]. Its unique network structure can effectively reduce the number of training parameters, thus reducing the complexity of the network. Meanwhile, it guarantees invariance of translation, rotation and scaling to a certain degree.

The convolution layer consists of multiple convolution kernel filters. The kernel filter convolves with the child nodes of the input layer and then outputs the results. Each kernel filter repeatedly acts on its entire neuronal receptive field, performs a convolution operation on the pre-processed input feature map, and then uses the activation function to output the convolution result to form a feature map for local features extraction of the input feature map. Each convolution kernel filter consisting of weights $w^{(i,j)}$ in the convolution layer does the convolution operation with the local data $a_{l}^{(i,j)}$ in the $l$ layer, and then outputs $z_{l}^{(i,j)}$ through the activation function. The convolution kernel filter convolves all the input data one by one according to the step set by the model. The mathematical expression of CNN’s convolution operation is as follows: $\begin{equation*} x_{j}^{out}=f_{cov}\left({\sum \nolimits _{i\in M_{j}} {x_{i}^{input}\cdot k_{ij}+b_{j}} }\right)\tag{1}\end{equation*}$ View Source where $x_{i}^{input}$ is the $i-th$ input feature map in the convolution layer, $k_{ij}$ is the $j-th$ weight matrix of the corresponding convolution layer, $b_{j}$ is the $j-th$ bias term of the convolution layer, $M_{j}$ is the collection of input feature maps, $f_{cov}$ is the activation function, and $x_{j}^{out}$ is the output feature map.

The pooling layer which is also composed of a convolution kernel filter, is usually set after the convolution layer. The calculation in the pooling layer kernel filter is not a weighted sum of neuron nodes, but a maximum or average operation. The purpose of the pooling layer is to perform the function of secondary feature extraction. By reducing the length, width, and depth of the feature map matrix, the preset parameters are reduced, thus increasing the calculation rate. Different from the convolution operation of the convolution layer, that of the single convolution kernel filter of the pooling layer is exclusively for nodes at one depth, so the convolution operation of the sampling layer not only takes samples in the length and width directions of the input data, but also in the depth direction. During the dimensionality reduction, the convolution operation of the sampling layer is equivalent to the secondary feature extraction of the input data. The sampling mathematical expression of pooling layer convolution filter is as follows: $\begin{equation*} x_{il}^{out}=f(x_{ip}^{input},x_{i\left ({p+1 }\right)}^{input},\cdots)\tag{2}\end{equation*}$ View Source where, $x_{ip}^{input}$ is the node value of the $p-th$ neuron of the $i-th$ input feature map in the pooling layer, $x_{i(p+1)}^{input}$ is the node value of the $(p+1)-th$ neuron of the $(i+1)-th$ input feature map in the pooling layer $f()$ is the maximum value function or the average function, and $x_{il}^{out}$ is the node value of the $l-th$ neuron of the $i-th$ output feature in the pooling layer.

The fully connected layer is a classification module of the model. It can map the distributed features extracted by the convolution layer and the pooling layer to the target space, namely transforming from a high-dimensional space to a low-dimensional space. And it is fully connected with the previous layer, so there are more layer parameters than other layers. The feature map of the previous layer of the fully connected layer is “rolled out” in turn through the convolution operation, and then activated by the activation function Softmax for classification output. The mathematical expression of the forward propagation of the fully connected layer is as follows: $\begin{equation*} a_{j}^{out}=f_{fc}\left({\sum \nolimits _{i\in M_{j}} {a_{i}^{in}\cdot w_{ij}+b_{j}} }\right)\tag{3}\end{equation*}$ View Source where $a_{i}^{in}$ is the output feature map of the layer previous to the fully connected layer, $a_{j}^{out}$ is the output of the fully connected layer, $w_{ij}$ is the weight between the neurons of different layers, $M_{j}$ is the collection of input feature maps, $b_{j}$ is the bias term, and $f_{fc}$ is the activation function of the fully connected layer.

The fully connected layer adopts the Softmax activation function for mapping to achieve multi-classification of data. The Softmax mathematical expression is as follows: $\begin{equation*} f\left ({z_{i} }\right)=\frac {exp(z_{i})}{\sum \nolimits _{i=1}^{c} {exp(z_{i})} }\tag{4}\end{equation*}$ View Source where $z_{i}=\sum \nolimits _{i\in M_{j}} {a_{i}^{in}\cdot w_{ij}+b_{j}}$ is the output logits value of the neurons in the fully connected layer.

B. Lightweight Convolutional Neural Network

Traditional CNNs usually increase the model depth and complexity in pursuit of high accuracy. But such large complicated models can never be applied to the real scenarios such as mobile or embedded devices. As the improved version of the CNN, LCNN operates with lightweight convolution instead of the traditional convolution. While maintaining the performance of the model, the model size is largely reduced and the model speed is increased. Therefore, it effectively reduces the calculation and storage load of the model, which is more conducive to the application of the mobile terminal and the Internet of Things context [26], [27].

1) Depthwise Separable Convolution

Depthwise separable convolution is one of typical representatives of lightweight convolution. After different convolution kernels are conducted on different input channels, experiments performed by F. C. et al in Xception proved that depthwise separable convolution can be applied to DCNN on a large scale [28]; In MobileNet, due to the large-scale use of depthwise separable convolution, a large number of parameters and calculations are reduced, thus accelerating the inference speed of the model [29]. Moreover, under the same conditions, the accuracy loss of the model can be ignored.

The schematic diagram of traditional convolution and depthwise separable convolution is shown in Fig. 1. In depthwise separable convolution, the convolution process is divided into two steps: depthwise convolution and pointwise convolution. In depthwise convolution, the same number of convolution kernels with the input channels are used for layer-by-layer convolution along the depth of the feature map, but the convolution results are not aggregated. Thus this is an extreme grouping convolution process. Pointwise convolution uses the same number of $1\times 1$ convolutions with the output convolution kernels to fuse multiple convolution results of the first step.

FIGURE 1.

The schematic diagram of traditional convolution and depthwise separable convolution: (a) traditional convolution; (b)depthwise separable convolution.

MIT Libraries

MIT Libraries

Bearing Intelligent Fault Diagnosis in the Industrial Internet of Things Context: A Lightweight Convolutional Neural Network

Alerts

Abstract:

Metadata

Abstract:

Introduction

Lightweight Convolutional Neural Network

A. Convolutional Neural Network

B. Lightweight Convolutional Neural Network

1) Depthwise Separable Convolution

2) Depthwise Convolution

3) Pointwise Convolution

Proposed Intelligent Fault Diagnosis Method

A. Design of the LCNN for Bearing Fault Diagnosis

B. Network Optimization

C. Proposed Diagnosis Framework

Experimental Verification

A. Case Study on Case Western Reserve University Bearing Fault Dataset

1) Experiment Description

2) Results

B. Case Study on MFPT Bearing Fault Dataset

1) Experiment Description

2) Results

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?