I. Introduction
In order to achieve state-of-the-art accuracy, modern deep neural networks (DNNs) are becoming deeper and wider [1]. At the same time, there is an explosive growth in the number of edge devices connected into the network, such as smartphones, Internet of things (IoT) sensors, and drones. These edge devices usually have limited computation resources, and therefore, model compression is proposed to reduce the complexity of the DNN models for efficient implementation [1]. In model compression, sparse regularizations [2] or sparse priors [3] are usually assigned to the neurons (channels) such that the unimportant neurons (channels) can be zeroed out during training. However, most existing model compression methods can only generate one individual model (single resolution
In this paper, we refer “resolution” to “resource target”. For example, single resolution model means the model can meet only one resource target.
), While in practice, one may need to implement a DNN model on devices with drastically different computation resources. This requires to separately run the compression algorithm for multiple times to generate models with different resolutions, which is computationally expensive [4].