Conferences >2018 IEEE International Confe...

Adaptive Layerwise Quantization for Deep Neural Network Compression

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Building efficient deep neural network models has become a hot-spot in recent years for deep learning research. Many works on network compression try to quantize a neural...Show More

Metadata

Abstract:

Building efficient deep neural network models has become a hot-spot in recent years for deep learning research. Many works on network compression try to quantize a neural network with low bitwidth weights and activations. However, most of the existing network quantization methods set a fixed bitwidth for the whole network, which leads to large performance drop under high compression rate. In this paper we introduce an adaptive layerwise quantization method which quantizes the network with different bitwidth assigned to different layers. By using entropy of weights and activations as an importance indicator for each layer, we keep most of the layers under a high compression rate while a few most important layers receive more bit assignment. Experiments on CI-FAR10 and ImageNet2012 datasets demonstrate that our layerwise quantization could achieve smaller model size and less computation cost than the comparison fixed bitwidth methods with comparable accuracy, or higher accuracy with similar model size and computational complexity.

Published in: 2018 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 23-27 July 2018

Date Added to IEEE Xplore: 11 October 2018

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME.2018.8486500

Conference Location: San Diego, CA, USA

Contents

1. Introduction

Deep neural networks (DNNs) have become the most effective method in many computer vision tasks such as image classification, object detection and semantic segmentation. With only a few years of development, the depth of DNNs grows from 7 layers [1] into hundreds or even thousands of layers [2] [3]. For a typical deep convolution neural network, it may need hundreds of megabytes for weight storage and has billions of floating point calculations for inference. Many potential application scenarios, such as mobile devices or embedded systems, could not handle such large amount of computation for real-time application [4]–[6], The large model size also makes DNNs difficult to deploy and update. These difficulties motivate the researchers to design more efficient DNNs without degrading the model's representative power.

References is not available for this document.

Adaptive Layerwise Quantization for Deep Neural Network Compression

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Adaptive Layerwise Quantization for Deep Neural Network Compression

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References