Conferences >2023 IEEE/CVF Conference on C...

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation co...Show More

Metadata

Abstract:

Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized frame-work theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2 × 0.5.

Published in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 17-24 June 2023

Date Added to IEEE Xplore: 22 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR52729.2023.00768

Conference Location: Vancouver, BC, Canada

Funding Agency:

Contents

1. Introduction

Deep Neural Networks (DNNs) have rapidly become a research hotspot in recent years, being applied to various scenarios in practice. However, as DNNs evolve, better model performance is usually associated with huge resource consumption from deeper and wider networks [8], [14], [28]. Meanwhile, the research field of neural network compression and acceleration, which aims to deploy models in resource-constrained scenarios, is gradually gaining more and more attention, including but not limited to Neural Architecture Search [18], [19], [33], [35]–[40], [42], [43], network pruning [4], [7], [16], [27], [32], [41], and quantization [3], [5], [6], [15], [17], [21], [22], [29], [31]. Among these methods, quantization proposed to transform float network activations and weights to low-bit fixed points, which is capable of accelerating inference [13] or training [44] speed with little performance degradation. Figure 1.

Left: Reconstruction loss distribution of BRECQ [17] on 0.5 scaled MobileNetV2 quantized to 4/4 bit. Loss oscillation in BRECQ during reconstruction see red dashed box. Right: Mixed reconstruction granularity (MRECG) smoothing loss oscillation and achieving higher accuracy.

References is not available for this document.

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References