I. Introduction
Data sharing and distributed computation with security, privacy and safety have been identified amongst important current trends in application of data mining and machine learning to healthcare, computer vision, cyber-security, internet of things, distributed systems, data fusion and finance. [1]–[5], [2], [6]–[9]. Hosting of siloed data by multiple client (device or organizational) entities that do not trust each other due to sensitivity and privacy issues poses to be a barrier for distributed machine learning. This paper proposes a way to mitigate the reconstruction of raw data in such distributed machine learning settings from culpable attackers. Our approach is based on minimizing a statistical dependency measure called distance correlation [10]–[14] between raw data and any intermediary communications across the clients or server participating in distributed deep learning. We also ensure our learnt representations help maintain reasonable classification accuracies of the model, thereby making the model useful while also protecting raw sensitive data from reconstruction by an attacker that can be situated in any of the untrusted clients participated in distributed machine learning.