1. Introduction
In pursuit of high performance of Deep Neural Networks (DNNs), deeper and wider architectures have been proposed at the expense of larger model size and longer inference time. Examples include from AlexNet [13], [11], [2] to ResNet [7], [27], [21] and DenseNet [10]. However, in various practical applications, these networks can not satisfy the requirements of real-time response and low memory cost. Therefore, more and more efforts have been putting into model compression.