Journals & Magazines >IEEE Transactions on Dependab... >Volume: 22 Issue: 1

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Adversarial training has been widely explored for mitigating attacks against deep models. However, a critical limitation of existing works is that robustness enhancement ...Show More

Metadata

Abstract:

Adversarial training has been widely explored for mitigating attacks against deep models. However, a critical limitation of existing works is that robustness enhancement is at the cost of noticeable accuracy degradation. To achieve a better trade-off between robustness and accuracy, we propose the Vanilla Feature Distillation Adversarial Training (VFDAT), which conducts knowledge distillation from a pre-trained model (optimized towards high accuracy) to guide adversarial training model towards generating high-quality and well-separable features by constraining the obtained features of natural and adversarial examples. More specifically, both adversarial examples and their natural counterparts are forced to be aligned in feature space by distilling predictive representations from a pre-trained natural model. In this way, the adversarial training model can be updated towards maximally preserving the accuracy as gaining robustness. A key advantage of our method is that it can be universally adapted to and boost existing works. Exhaustive experiments on various datasets, classification models, and adversarial training algorithms demonstrate the effectiveness of our proposed method.

Published in: IEEE Transactions on Dependable and Secure Computing ( Volume: 22, Issue: 1, Jan.-Feb. 2025)

Page(s): 664 - 676

Date of Publication: 11 June 2024

ISSN Information:

DOI: 10.1109/TDSC.2024.3411302

Funding Agency:

Contents

I. Introduction

Deep neural networks (DNNs) have widely deployed in various of daily tasks, e.g., image classification [1], [2], object detection [3], [4], autonomous driving [5], etc. However, DNNs are known to be vulnerable to adversarial examples generated by overlaying carefully designed perturbation into original/natural examples [6], [7], [8]. Against those adversarial examples, the adversarial training is explored to improve the robustness of DNNs, typically by feeding adversarial examples to the model during the training stage [9], [10], [11]. Adversarial training can be formulated as a min-max optimization problem, where perturbation is generated to maximize the original loss, and then the model is optimized against the perturbation/attacks by minimizing the loss [12]. Its goal is to make the model robust and prompt the model to correctly classify the input samples, even with the adversarial perturbation.

References is not available for this document.

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial Training

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References