Journals & Magazines >IEEE Transactions on Knowledg... >Volume: 32 Issue: 8

Multi-Party High-Dimensional Data Publishing Under Differential Privacy

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we study the problem of publishing high-dimensional data in a distributed multi-party environment under differential privacy. In particular, with the assis...Show More

Metadata

Abstract:

In this paper, we study the problem of publishing high-dimensional data in a distributed multi-party environment under differential privacy. In particular, with the assistance of a semi-trusted curator, the parties (i.e., local data owners) collectively generate a synthetic integrated dataset while satisfying

$\varepsilon$ -differential privacy. To solve this problem, we present a differentially private sequential update of Bayesian network (DP-SUBN) approach. In DP-SUBN, the parties and the curator collaboratively identify the Bayesian network

$\mathbb {N}$ that best fits the integrated dataset in a sequential manner, from which a synthetic dataset can then be generated. The fundamental advantage of adopting the sequential update manner is that the parties can treat the intermediate results provided by previous parties as their prior knowledge to direct how to learn

$\mathbb {N}$ . The core of DP-SUBN is the construction of the search frontier, which can be seen as a priori knowledge to guide the parties to update

$\mathbb {N}$ . By exploiting the correlations of attribute pairs, we propose exact and heuristic methods to construct the search frontier. In particular, to privately quantify the correlations of attribute pairs without introducing too much noise, we first put forward a non-overlapping covering design (NOCD) method, and then devise a dynamic programming method for determining the optimal parameters used in NOCD. Through privacy analysis, we show that DP-SUBN satisfies

$\varepsilon$ -differential privacy. Extensive experiments on real datasets demonstrate that DP-SUBN offers desirable data utility with low communication cost.

Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 32, Issue: 8, 01 August 2020)

Page(s): 1557 - 1571

Date of Publication: 25 March 2019

ISSN Information:

DOI: 10.1109/TKDE.2019.2906610

Funding Agency:

Contents

1 Introduction

With the rapid pace of digitization, high-dimensional data, such as healthcare data or user behaviour data, have been increasingly collected and used for different purposes. More than often, such data are possessed by different parties as if the data were horizontally partitioned among multiple parties. When integrated, these distributed data can be a valuable source for supporting better decision making or providing high-quality services. However, since the dataset held by each party may contain highly sensitive personal information, simply integrating the local datasets and sharing the integrated result will pose serious threats to individual privacy. The following scenario further motivates the problem. Assume that three hospitals \$H_1\$, \$H_2\$ and \$H_3\$ want to integrate their patient data and share the integrated result to facilitate more effective clinical research. Table 1 shows the patient data integrated from the three hospitals, where records 1 to 3 are from \$H_1\$, records 4 to 6 are from \$H_2\$, and records 7 to 10 are from \$H_3\$. Although these records are integrated with only pseudo IDs, many individuals might be easily re-identified by adversarial data recipients with some background knowledge. Suppose that the adversary knows that the target patient is a Builder and his age is 40. Record #9 together with his sensitive value (i.e., Lung cancer) can be uniquely identified since he is the only Builder who is 40 years old in the integrated data.

References is not available for this document.

MIT Libraries

MIT Libraries

Multi-Party High-Dimensional Data Publishing Under Differential Privacy

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Multi-Party High-Dimensional Data Publishing Under Differential Privacy

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

References