Loading web-font TeX/Main/Regular
HyObscure: Hybrid Obscuring for Privacy-Preserving Data Publishing | IEEE Journals & Magazine | IEEE Xplore

HyObscure: Hybrid Obscuring for Privacy-Preserving Data Publishing


Abstract:

Minimizing privacy leakage while ensuring data utility is a critical problem in a privacy-preserving data publishing task, from which data holders can boost platform enga...Show More

Abstract:

Minimizing privacy leakage while ensuring data utility is a critical problem in a privacy-preserving data publishing task, from which data holders can boost platform engagements or enlarge data values. Most prior research concerned only with either privacy-insensitive or exact private data and resorts to a single obscuring method to achieve a privacy-utility tradeoff, which is inadequate for real-life hybrid data especially when facing machine learning-based inference attacks. This work takes a pilot study on privacy-preserving data publishing when both widely adopted generalization and obfuscation operations are employed for privacy-heterogeneous data protection. Specifically, we first propose novel measures for privacy and utility values quantification and formulate the hybrid privacy-preserving data obscuring problem to account for the joint effect of generalization and obfuscation. We then design a novel protection mechanism called HyObscure, which decomposes the original problem into three sub-problems to cross-iteratively optimize the hybrid operations for maximum privacy protection under a certain data utility guarantee. The convergence of the iterative process and the privacy leakage bound of HyObscure are also provided in theory. Extensive experiments demonstrate that HyObscure significantly outperforms a variety of state-of-the-art baseline methods when facing various inference attacks in different scenarios.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 36, Issue: 8, August 2024)
Page(s): 3893 - 3905
Date of Publication: 09 November 2023

ISSN Information:

Funding Agency:


I. Introduction

In the big data era, data publishing has become a popular way to facilitate data exploitation and enlarge economic values of data [1], [2], [3]. Leading data holders like Facebook and Twitter provide APIs to share data with third-parties with the purpose of increasing platform engagements [4]. More and more data holders nowadays are prone to publishing data, such as Mimic database

https://mimic.mit.edu/docs/iv/

, MovieLens

https://netflixprize.com/

, and Yelp challenges

https://www.yelp.com/dataset/challenge

, to seek worldwide help in data exploitation. Indeed, data publishing as positive externality has enabled service innovation, scientific discovery, and other public benefits, which generate enormous economic values amounting to over {\\$}3 trillion annually

https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/open-data-unlocking-innovation-and-performance-with-liquid-information

.

Contact IEEE to Subscribe

References

References is not available for this document.