I. Introduction
Just as information technology and electronic communications have been rapidly applied to almost every sphere of human activity, including commerce, medicine and social networking, the risk of accidental or intentional disclosure of sensitive private information has increased. The concomitant creation of large centralized searchable data repositories and deployment of applications that use them has made leakage of private information such as medical data, credit card information, power consumption data, etc. highly probable and thus an important and urgent societal problem. In contrast to the secrecy problem, in the privacy problem, disclosing data provides informational utility while enabling possible loss of privacy at the same time. Thus, as shown in Fig. 1, in the course of a legitimate transaction, a user learns some public information (e.g., gender and weight), which is allowed and needs to be supported for the transaction to be meaningful, and at the same time he can also learn/infer private information (e.g., cancer diagnosis and income), which needs to be prevented (or minimized). Thus, every user of the data is (potentially) also an adversary.
Example database with public and private attributes and its sanitized version.