NoPeek: Information leakage reduction to share activations in distributed deep learning | IEEE Conference Publication | IEEE Xplore

NoPeek: Information leakage reduction to share activations in distributed deep learning


Abstract:

For distributed machine learning with sensitive data, we demonstrate how minimizing distance correlation between raw data and intermediary representations reduces leakage...Show More

Abstract:

For distributed machine learning with sensitive data, we demonstrate how minimizing distance correlation between raw data and intermediary representations reduces leakage of sensitive raw data patterns across client communications while maintaining model accuracy. Leakage (measured using distance correlation between input and intermediate representations) is the risk associated with the invertibility of raw data from intermediary representations. This can prevent client entities that hold sensitive data from using distributed deep learning services. We demonstrate that our method is resilient to such reconstruction attacks and is based on reduction of distance correlation between raw data and learned representations during training and inference with image datasets. We prevent such reconstruction of raw data while maintaining information required to sustain good classification accuracies.
Date of Conference: 17-20 November 2020
Date Added to IEEE Xplore: 16 February 2021
ISBN Information:

ISSN Information:

Conference Location: Sorrento, Italy

I. Introduction

Data sharing and distributed computation with security, privacy and safety have been identified amongst important current trends in application of data mining and machine learning to healthcare, computer vision, cyber-security, internet of things, distributed systems, data fusion and finance. [1]–[5], [2], [6]–[9]. Hosting of siloed data by multiple client (device or organizational) entities that do not trust each other due to sensitivity and privacy issues poses to be a barrier for distributed machine learning. This paper proposes a way to mitigate the reconstruction of raw data in such distributed machine learning settings from culpable attackers. Our approach is based on minimizing a statistical dependency measure called distance correlation [10]–[14] between raw data and any intermediary communications across the clients or server participating in distributed deep learning. We also ensure our learnt representations help maintain reasonable classification accuracies of the model, thereby making the model useful while also protecting raw sensitive data from reconstruction by an attacker that can be situated in any of the untrusted clients participated in distributed machine learning.

Contact IEEE to Subscribe

References

References is not available for this document.