1. Introduction
Deep neural networks (DNNs) have become the most effective method in many computer vision tasks such as image classification, object detection and semantic segmentation. With only a few years of development, the depth of DNNs grows from 7 layers [1] into hundreds or even thousands of layers [2] [3]. For a typical deep convolution neural network, it may need hundreds of megabytes for weight storage and has billions of floating point calculations for inference. Many potential application scenarios, such as mobile devices or embedded systems, could not handle such large amount of computation for real-time application [4]–[6], The large model size also makes DNNs difficult to deploy and update. These difficulties motivate the researchers to design more efficient DNNs without degrading the model's representative power.