Journals & Magazines >IEEE Transactions on Pattern ... >Volume: 44 Issue: 11

Mining Data Impressions From Deep Models as Substitute for the Unavailable Training Data

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Pretrained deep models hold their learnt knowledge in the form of model parameters. These parameters act as “memory” for the trained models and help them generalize well ...Show More

Metadata

Abstract:

Pretrained deep models hold their learnt knowledge in the form of model parameters. These parameters act as “memory” for the trained models and help them generalize well on unseen data. However, in absence of training data, the utility of a trained model is merely limited to either inference or better initialization towards a target task. In this paper, we go further and extract synthetic data by leveraging the learnt model parameters. We dub them Data Impressions, which act as proxy to the training data and can be used to realize a variety of tasks. These are useful in scenarios where only the pretrained models are available and the training data is not shared (e.g., due to privacy or sensitivity concerns). We show the applicability of data impressions in solving several computer vision tasks such as unsupervised domain adaptation, continual learning as well as knowledge distillation. We also study the adversarial robustness of lightweight models trained via knowledge distillation using these data impressions. Further, we demonstrate the efficacy of data impressions in generating data-free Universal Adversarial Perturbations (UAPs) with better fooling rates. Extensive experiments performed on benchmark datasets demonstrate competitive performance achieved using data impressions in absence of original training data.

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 44, Issue: 11, 01 November 2022)

Page(s): 8465 - 8481

Date of Publication: 16 September 2021

ISSN Information:

PubMed ID: 34529560

DOI: 10.1109/TPAMI.2021.3112816

Funding Agency:

Contents

1 Introduction

SUPERVISED learning typically requires large volumes of labelled data. Training of sophisticated deep neural networks (DNNs) often involves learning from thousands (MNIST [1], CIFAR [2]) (sometimes millions, e.g., ImageNet [3]) of data samples. Despite their ability to train complex models, these training datasets pose practical challenges. These datasets (i) are often huge in size (e.g., ImageNet [3]), (ii) are proprietary, and (iii) involve privacy concerns (e.g., biometric, healthcare data). Hence, in practice, public access to the data samples used for training may not always be feasible. Instead, the resulting trained models can be made available relatively easily. For instance, Facebook’s Deepface [4] model is trained over 4M confidential face images.

References is not available for this document.

MIT Libraries

MIT Libraries

Mining Data Impressions From Deep Models as Substitute for the Unavailable Training Data

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Mining Data Impressions From Deep Models as Substitute for the Unavailable Training Data

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?