Loading [MathJax]/extensions/MathMenu.js
Evolutionary Multi-Objective Model Compression for Deep Neural Networks | IEEE Journals & Magazine | IEEE Xplore

Evolutionary Multi-Objective Model Compression for Deep Neural Networks


Abstract:

While deep neural networks (DNNs) deliver state-of-the-art accuracy on various applications from face recognition to language translation, it comes at the cost of high co...Show More

Abstract:

While deep neural networks (DNNs) deliver state-of-the-art accuracy on various applications from face recognition to language translation, it comes at the cost of high computational and space complexity, hindering their deployment on edge devices. To enable efficient processing of DNNs in inference, a novel approach, called Evolutionary Multi-Objective Model Compression (EMOMC), is proposed to optimize energy efficiency (or model size) and accuracy simultaneously. Specifically, the network pruning and quantization space are explored and exploited by using architecture population evolution. Furthermore, by taking advantage of the orthogonality between pruning and quantization, a two-stage pruning and quantization co-optimization strategy is developed, which considerably reduces time cost of the architecture search. Lastly, different dataflow designs and parameter coding schemes are considered in the optimization process since they have a significant impact on energy consumption and the model size. Owing to the cooperation of the evolution between different architectures in the population, a set of compact DNNs that offer trade-offs on different objectives (e.g., accuracy, energy efficiency and model size) can be obtained in a single run. Unlike most existing approaches designed to reduce the size of weight parameters with no significant loss of accuracy, the proposed method aims to achieve a trade-off between desirable objectives, for meeting different requirements of various edge devices. Experimental results demonstrate that the proposed approach can obtain a diverse population of compact DNNs that are suitable for a broad range of different memory usage and energy consumption requirements. Under negligible accuracy loss, EMOMC improves the energy efficiency and model compression rate of VGG-16 on CIFAR-10 by a factor of more than 8 9. X and 2.4 X, respectively.
Published in: IEEE Computational Intelligence Magazine ( Volume: 16, Issue: 3, August 2021)
Page(s): 10 - 21
Date of Publication: 21 July 2021

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Deep neural networks (DNNs) are artificial neural networks with more than three layers (i.e., more than one hidden layer), which progressively extract higher-level features from the raw input in the learning process. They have delivered the state-of-the-art accuracy on various real-world problems, such as image classification, face recognition, and language translation [1]. The superior accuracy of DNNs, however, comes at the cost of high computational and space complexity. For example, the VGG-16 model [2] has about 138 million parameters, which requires over 500 MB memory for storage and 15.5G multiply-and-accumulates (MACs) to process an input image with 224 × 224 pixels. In myriad application scenarios, it is desirable to make the inference on edge devices rather than on cloud, for reducing the latency and dependency on connectivity and improving privacy and security. Many of the edge devices that draw the DNNs inference have stringent limitations on energy consumption, memory capacity, etc. The large-scale DNNs [3], [4] are usually difficult to be deployed on edge devices, thus hindering their wide application.

Select All
1.
I. Goodfellow, Y. Bengio, A. Courville and Y. Bengio, Deep Learning, Cambridge MA:MIT Press, vol. 1, no. 2, 2016.
2.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", 2014, [online] Available: https://arxiv.org/abs/1804.09081.
3.
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov and Q. V. Le, "XLNet: Generalized autoregressive pretraining for language understanding", Adv. Neural Inf. Process. Syst., pp. 5753-5763, 2019.
4.
L. Zhen, P. Hu, X. Peng, R. S. M. Goh and J. T. Zhou, "Deep multimodal transfer learning for cross-modal retrieval", IEEE Trans. Neural Netw. Learn. Syst..
5.
H. Liu, K. Simonyan and Y. Yang, "DARTS: Differentiable architecture search", Proc. Int. Conf. Learn. Representations, 2018.
6.
Y. Sun, H. Wang, B. Xue, Y. Jin, G. G. Yen and M. Zhang, "Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor", IEEE Trans. Evol. Comput., vol. 24, no. 2, pp. 350-364, 2020.
7.
Y. Sun, G. G. Yen and Z. Yi, "Evolving unsupervized deep neural networks for learning meaningful representations", IEEE Trans. Evol. Comput., vol. 23, no. 1, pp. 89-103, 2018.
8.
Y. Sun, B. Xue, M. Zhang and G. G. Yen, "Completely automated CNN architecture design based on blocks", IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1242-1254, 2020.
9.
Y. Sun, B. Xue, M. Zhang and G. G. Yen, "Evolving deep convolutional neural networks for image classification", IEEE Trans. Evol. Comput., vol. 24, no. 2, pp. 394-407, 2020.
10.
S. Han, H. Mao and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning trained quantization and Huffman coding", 2015, [online] Available: https://arxiv.org/abs/1510.00149.
11.
Y. Wang, C. Xu, J. Qiu, C. Xu and D. Tao, "Towards evolutionary compression", Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 2476-2485, 2018.
12.
T.-J. Yang, Y.-H. Chen and V. Sze, "Designing energy-efficient convolutional neural networks using energy-aware pruning", Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 5687-5695, 2017.
13.
K. Wang, Z. Liu, Y. Lin, J. Lin and S. Han, "HAQ: Hardware-aware automated quantization with mixed precision", Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2019.
14.
V. Sze, Y.-H. Chen, T.-J. Yang and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey", Proc. IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
15.
Z. Wang, T. Luo, J. T. Zhou and R. S. M. Goh, "EDCompress: Energy-aware model compression with dataflow", 2020, [online] Available: https://arxiv.org/abs/2006.04588.
16.
Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor", Proc. Annu. Int. Symp. Computer Architecture, pp. 92-104, 2015.
17.
M. Song et al., "Towards efficient microarchitectural design for accelerating unsupervized GAN-based deep learning", Proc. IEEE Int. Symp. High Performance Comput. Architecture, pp. 66-77, 2018.
18.
N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit", Proc. Annu. Int. Symp. Comput. Architecture, pp. 1-12, 2017.
19.
M. Alwani, H. Chen, M. Ferdman and P. Milder, "Fused-layer CNN accelerators", Proc. Annu. IEEE/ACM Int. Symp. Microarchitecture, pp. 1-12, 2016.
20.
J. Qiu et al., "Going deeper with embedded FPGA platform for convolutional neural network", Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 26-35, 2016.
21.
Y.-H. Chen et al., "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks", ACM SIGARCH Comput. Architecture News, vol. 44, no. 3, pp. 367-379, 2016.
22.
H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou and L. Wang, "A high performance FPGA-based accelerator for large-scale convolutional neural networks", Proc. Int. Conf. Field Programmable Logic Appl., pp. 1-9, 2016.
23.
Y. Guo, A. Yao and Y. Chen, "Dynamic network surgery for efficient DNNs", Proc. Adv. Neural Inf. Process. Syst., pp. 1379-1387, 2016.
24.
F. Manessi, A. Rozza, S. Bianco, P. Napoletano and R. Schettini, "Automated pruning for deep neural network compression", Proc. Int. Conf. Pattern Recognit., pp. 657-664, 2018.
25.
C. Lemaire, A. Achkar and P.-M. Jodoin, "Structured pruning of neural networks with budget-aware regularization", Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 9108-9116, 2019.
26.
H. Li, A. Kadav, I. Durdanovic, H. Samet and H. P. Graf, "Pruning filters for efficient convnets", 2016, [online] Available: https://arxiv.org/abs/1608.08710.
27.
Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li and S. Han, "AMC: AutoML for model compression and acceleration on mobile devices", Proc. Eur. Conf. Comput. Vision, pp. 784-800, 2018.
28.
Z. Liu, J. Xu, X. Peng and R. Xiong, "Frequency-domain dynamic pruning for convolutional neural networks", Proc. Adv. Neural Inf. Process. Syst., pp. 1043-1053, 2018.
29.
A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks", Proc. Adv. Neural Inf. Process. Syst., pp. 1097-1105, 2012.
30.
C. Szegedy et al., "Going deeper with convolutions", Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 1-9, 2015.
Contact IEEE to Subscribe

References

References is not available for this document.