Conferences >2022 International Joint Conf...

Teacher-free Distillation via Regularizing Intermediate Representation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Feature distillation always leads to significant performance improvements, but requires extra training budgets. To address the problem, we propose TFD, a simple and effec...Show More

Metadata

Abstract:

Feature distillation always leads to significant performance improvements, but requires extra training budgets. To address the problem, we propose TFD, a simple and effective Teacher-Free Distillation framework, which seeks to reuse the privileged features within the student network itself. Specifically, TFD squeezes feature knowledge in the deeper layers into the shallow ones by minimizing feature loss. Thanks to the narrow gap of these self-features, TFD only needs to adopt a simple l2 loss without complex transformations. Extensive experiments on recognition benchmarks show that our framework can achieve superior performance than teacher-based feature distillation methods. On the ImageNet dataset, our approach achieves 0.8% gains for ResNet18, which surpasses other state-of-the-art training techniques.

Published in: 2022 International Joint Conference on Neural Networks (IJCNN)

Date of Conference: 18-23 July 2022

Date Added to IEEE Xplore: 30 September 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/IJCNN55064.2022.9892575

Conference Location: Padua, Italy

Contents

I. Introduction

Alongside deep learning's tremendous success in different tasks [2], [11], [23], [38], it remains difficult to apply deep neural networks to real-world problems due to computational and memory constraints. To avoid this problem, many attempts [12], [26], [48], [49] have been made to reduce the computational cost of deep learning models, with Knowledge Distillation (KD) [16] being one of them. KD is a network training strategy that works by transferring knowledge from a high-capacity teacher model to a low-capacity target student model at runtime, resulting in a better accuracy-efficiency trade-off.

References is not available for this document.

Teacher-free Distillation via Regularizing Intermediate Representation

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Teacher-free Distillation via Regularizing Intermediate Representation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References