Conferences >2024 IEEE/CVF Conference on C...

Small Scale Data-Free Knowledge Distillation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Data-free knowledge distillation is able to utilize the knowledge learned by a large teacher network to augment the training of a smaller student network without accessin...Show More

Metadata

Abstract:

Data-free knowledge distillation is able to utilize the knowledge learned by a large teacher network to augment the training of a smaller student network without accessing the original training data, avoiding privacy, security, and proprietary risks in real applications. In this line of research, existing methods typically follow an inversion-and-distillation paradigm in which a generative adversarial network on-the-fly trained with the guidance of the pre-trained teacher network is used to synthesize a large-scale sample set for knowledge distillation. In this paper, we reexam-ine this common data-free knowledge distillation paradigm, showing that there is considerable room to improve the overall training efficiency through a lens of “small-scale inverted data for knowledge distillation”. In light of three empirical observations indicating the importance of how to balance class distributions in terms of synthetic sample di-versity and difficulty during both data inversion and distillation processes, we propose Small Scale Data-free Knowledge Distillation (SSD-KD). Informulation, SSD-KD introduces a modulating function to balance synthetic samples and a priority sampling function to select proper samples, facilitated by a dynamic replay buffer and a reinforcement learning strategy. As a result, SSD-KD can perform dis-tillation training conditioned on an extremely small scale of synthetic samples (e.g., 10x less than the original training data scale), making the overall training efficiency one or two orders of magnitude faster than many mainstream methods while retaining superior or competitive model performance, as demonstrated on popular image classification and semantic segmentation benchmarks. The code is available at https://github.com/OSVAI/SSD-KD.

Published in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 16-22 June 2024

Date Added to IEEE Xplore: 16 September 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR52733.2024.00574

Conference Location: Seattle, WA, USA

Contents

1. Introduction

For computer vision applications on resource-constrained devices, how to learn portable neural networks yet with sat-isfied prediction accuracy is the key problem. Knowledge distillation (KD) [2], [12], [17], [22], [25], [33], which leverages the information of a pre-trained large teacher network to promote the training of a smaller target student network on the same training data, has become a mainstream solution. Conventional KD methods assume that the original training data is always available. However, accessing the source dataset on which the teacher network was trained is usually not feasible in practice, due to its potential privacy or security or proprietary or huge-size concerns. To relax the constraint on training data, know ledge distillation under a data-free regime has recently attracted increasing attention.

References is not available for this document.

Small Scale Data-Free Knowledge Distillation

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Small Scale Data-Free Knowledge Distillation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References