I. Introduction
In recent years, Convolutional Neural Networks (CNNs) have achieved a huge success in artificial intelligence. It conquered plenty of problems that were hard to deal with using computers in the early days. Currently, deploying AI on edge devices such as smartphones, wearable devices, and IOT devices has become a trend to meet people's daily need. However, a great computational challenge still remains for achieving this goal. CNNs usually contain vast numbers of floating-point parameters and require enormous numbers of floating-point operations in both training and inference phases. This scenario makes CNNs difficult to be implemented on edge devices. As a result, minimizing the computational complexity of neural network inference on edge devices has become crucial.