I. Introduction
With the advancement of sensing technology and the quick development of machine learning methods, data-driven methods have become increasingly popular for classification and regression tasks. Particularly, several deep neural networks (DNN) have been successfully applied to image recognition and anomaly detection tasks, e.g., convolutional neural networks (CNN) and long short-term memory networks (LSTM). Deep learning works well in automatically learning deep hierarchical features of the input data [1]. Note that most of these methods assume that training and testing data have similar distributions. However, due to varying operational conditions or different process parameters, nonstationary data distribution exists in many real-world applications, which may lead to unreliable prediction results. To address this issue, transfer learning has been actively studied recently, which aims to adapt a model trained in a source domain to its application in a target domain [2]. A commonly used idea of transfer learning is to extract features where different domains are close to each other while containing enough discriminative information for prediction [3]. By integrating deep learning with transfer learning, deep transfer learning-based methods can automatically extract the domain-invariant and discriminative features in various tasks [4]–[6].