I. Introduction
In the recent years, the wide diffusion of mobile devices has made human activity recognition(HAR) based on wearable sensors [1] become a new research point in the field of artificial intelligence and pattern recognition [2], [3], and there are prevailing applications which benefit from HAR include sports activity detection [4], smart homes [5] and health support [6], et al. Those sensors such as accelerometers, gyroscopes and magnetometers [7], which are embedded on mobile devices, can generate time-series data for HAR. Traditional methods, which have been developed to facilitate human activity recognition, are inside the range of supervised learning. For instance, the pervious methods include SVM [8] and Random Forest [9], which require to extract handcrafted features as the inputs of classifiers. Later, deep learning, and in particular, convolutional neural networks, has been diffusely used in the field of HAR. Although deep learning models have excellent performance in HAR, there are some challenges need to be addressed, the main one of which is the need for labeled datasets for ground truth annotation [10].