I. Introduction
In recent years, researchers have extensively explored the application of DNNs on numerous computer vision tasks, such as image classification [3], [4], object detection [5], [6], and video analysis [7], [8], [9]. However, deep models are often difficult to deploy to some resource-constrained devices (e.g., smart bracelets, mobile phones, sensors) due to the enormous computational cost and memory footprint. To address this issue, various model compression methods have been proposed to compress and accelerate the deep model, including network pruning [10], quantization [11], [12], knowledge distillation [13] and tensor factorization [14], etc.