I. Introduction
With rapid development of the Internet-of-Things and sensor technology, human activity recognition (HAR) using wearable inertial sensors has become a new research hotspot due to its extensive use in a large variety of application domains such as health-care [1], sports tracking [2], [3], fitness, game console design [4] and smart homes [5]. Deep learning [6], [7] has gained a lot of attention in sensor-based HAR scenario. Especially, convolutional neural networks (CNNs) have started delivering their advantages over feature learning and achieved state-of-the-art performance for HAR [8], [9]. Traditionally, various methods from the field of signal processing [10], [11] have been widely leveraged to distill collected sensor data, which requires domain-specific expert knowledge to process raw data. Statistical and machine learning models are then trained on the version of processed data [12]. That is to say, feature engineering is required to fit a model, which is expensive and not scalable. CNNs are capable of performing automatic feature learning, which significantly surpasses models fit on hand-crafted domain-specific features. Ideally, CNNs with automatic feature extraction provide the ability to learn features from raw sensor data with little pre-processing involved in feature engineering.