Introduction
A moving objects database is a multiset of trajectories, each representing the movement history of a moving object during a period of time. Moving objects databases have become very important in recent years because of their applications in many domains such as location-based services, municipal transportation, and traffic management.
Moving objects databases often contain sensitive information and, therefore, improper use of them may lead to privacy breaches. Moreover, for many applications, moving objects may have non-spatiotemporal sensitive attributes such as disease, job, and income. Therefore, there is a growing concern about breaching the privacy of moving objects whose locations are monitored and tracked.
Differential privacy (DP) [1], [2] has recently emerged as a de facto standard for private statistics publishing due to its strong provable privacy guarantees. It ensures that the probability that a statistical query will produce a given result is (nearly) the same as when one data record is added or removed from the database.
In the last years, several mechanisms have been proposed to answer statistical queries, while satisfying differential privacy for various types of databases [3]–[8], including moving objects databases [9]–[11]. However, as far as we know, none of the mechanisms proposed for moving objects databases consider the case where a moving objects database contains non-spatiotemporal sensitive attributes other than spatiotemporal attributes, associated to trajectories. This is while many real moving objects databases contain non-spatiotemporal sensitive attributes such as medical or salary information. Such moving objects databases have many applications in industry and medicine. For example, a hospital may use an RFID patient tagging system in which patients’ trajectories, personal data, and medical data are stored in a central moving objects database [12]. From now on, for convenience, we refer to non-spatiotemporal sensitive attributes simply as sensitive attributes.
Personalized privacy is an inherent notion in privacy protection, whose main objective is to guarantee the desired level of privacy protection for each individual, based on its privacy protection requirement. By applying a uniform (non-personalized) privacy model, some data records would be offered insufficient privacy protection, while others would be exerted excessive privacy control. Although excessive control poses no harm to privacy, it reduces the utility of published results [13].
Personalized differential privacy (PDP) [14] is a recently-introduced notion of differential privacy, which takes into account that different individuals may have different privacy protection requirements. However, most of the existing differentially private mechanisms for moving objects databases do not explicitly consider personalization based on the privacy protection requirements of moving objects.
As we said above, many moving objects databases contain both spatiotemporal and non-spatiotemporal attributes, and preserving privacy for such moving objects databases is an issue of great interest and concern in recent years [12], [15], [16]. That’s while, to the best of our knowledge, no differentially private mechanism has been presented so far to deal with such moving objects databases. On the other hand, many real moving object databases need to guarantee personalized privacy for moving objects based on their privacy protection requirements. In this paper, to address these problems, we present PDP-SAG, a differentially private mechanism that combines the sensitive attribute generalization with personalized privacy in a unified manner to achieve the desired level of privacy protection for moving objects of a moving objects database. In a nutshell, we generalize the sensitive attribute values of trajectory data records based on their privacy descriptor and then construct a noisy trajectory tree to represent all possible subtrajectories with their noisy frequencies in the underlying moving objects database. We further define a so-called personalized sensitive attribute generalization tree (PSAGT) for each node of the noisy trajectory tree. In fact, by doing this, we aim to keep different frequencies for each trajectory according to the generalized sensitive attribute values of trajectory data records passing through that trajectory. In continue, we allocate personal privacy budgets to the nodes of PSAGTs and then obtain the noisy version of each PSAGT in such a way that PDP-SAG satisfies PDP. Finally, we enforce all required consistency constraints via intra- and inter-consistency constraints enforcement steps. Actually, the intra-consistency constraints enforcement step makes each noisy PSAGT internally consistent, while the inter-consistency constraints enforcement step imposes different noisy PSAGTs to be consistent with each other. We use these final consistent noisy PSAGTs to answer sensitive count queries in a personalized differentially private manner.
In the following, we list the main contributions of this paper:
To the best of our knowledge, we are the first to apply differential privacy to moving objects databases that contain sensitive attributes other than spatiotemporal attributes. Also, we are the first to combine the sensitive attribute generalization (SAG) with personalized differential privacy (PDP) in a unified manner. This allows us to provide personalized privacy with strong guarantees of differential privacy.
We propose PDP-SAG, a novel differentially private mechanism that uses a new tree structure, known as noisy personalized sensitive attribute generalization tree (PSAGT), to keep different noisy frequencies for each trajectory according to the generalized sensitive attribute values of trajectory data records. This tree structure allows us to answer sensitive count queries in a personalized differentially private manner.
We propose intra- and inter-consistency constraints enforcements, to make each noisy PSAGT internally consistent and impose different noisy PSAGTs to be consistent with each other, respectively.
We formally prove that PDP-SAG satisfies personalized differential privacy for moving objects. Also, through extensive experiments on both synthetic and real datasets, we show how PDP-SAG improves the utility of sensitive query answers and provides the appropriate privacy protection for each moving object, in comparison to the case when no personalization and generalization are permitted. Besides, we define a new evaluation metric to measure the privacy protection provided for each trajectory data record (moving object) by considering both personalization and generalization.
The rest of the paper is organized as follows. Section II reviews related work. Section III provides basic definitions and preliminaries. Section IV describes how the sensitive attribute generalization is used in our work. In Section V, we introduce PDP-SAG and analyze its privacy guarantee. The experimental results are reported in Section VI and, finally, a summary and discussion are given in Section VII.
Related Work
In this section, we review the state-of-the-art of mechanisms that are somewhat related to our work.
For the first time, the concept of personalized privacy was introduced by Xiao and Tao [13] for relational databases. Subsequently, this concept was extended to other areas of privacy protection [14], [17]–[20], including moving objects databases [15], [16].
In the past few years, some differentially private mechanisms have been proposed to satisfy personalized privacy for different applications. However, none of them are applicable to moving objects databases that contain sensitive attributes other than spatiotemporal attributes. For example, Niknami et al. [4] introduced the concept of PDP for spatial databases. They presented a differentially private mechanism to answer range counting queries over spatial databases by considering the privacy protection requirements of different subregions. Jorgensen et al. [14] also considered that different users may potentially have different privacy protection requirements and the analyst would like to publish useful statistics, by satisfying the individual privacy protection requirements. They introduced two different differentially private mechanisms to address this problem. Liu et al. [21] showed that a large family of recommender systems, namely, those using matrix factorization, are well suited to differential privacy. Based on this observation, they described a fast matrix factorization based recommender system that allows to calibrate the level of privacy protection for each user independently. Alaggan et al. [22] defined the notion of heterogeneous differential privacy (HDP) in which the variation of privacy expectations is captured among users as well as across different pieces of information related to the same user. In this way, the non-uniformity of privacy expectations is provided. Li et al. [23] proposed two partitioning mechanisms, namely, privacy-aware partitioning and utility-based partitioning, to achieve PDP. The goal of the privacy-aware partitioning is to minimize the waste of privacy budgets, while the goal of the utility-based partitioning is to maximize the utility of target computations. Deldar and Abadi [11] proposed PLDP-TD, a personalized-location differentially private algorithm for simple trajectory databases where sensitive attribute values are not associated with trajectories. PLDP-TD considers that different locations may have different privacy protection requirements. However, it does not explicitly consider personalization based on the privacy protection requirements of moving objects.
Furthermore, some non-differentially private mechanisms have been proposed to preserve privacy for moving objects databases that contain both spatiotemporal and non-spatiotemporal attributes. For example, Mohammed et al. [24] developed an anonymization framework that employs global suppression to achieve
Preliminaries
In this section, we give some basic definitions and preliminaries that are used throughout the paper.
A. Differential Privacy
Differential privacy (DP) has emerged as one of the strongest privacy definitions for statistical databases. The intuition is that any sequence of answers to queries is equally likely to occur, even if an individual data record is present or absent in the database. Thus, even though an adversary observes sequences of query answers from two neighboring databases, the privacy of individual data records is still not disclosed. In the following, we define the concepts behind differential privacy.
Definition 1 (Neighboring Databases):
Two databases
Definition 2 (\varepsilon
-DP):
A randomized algorithm \begin{equation*} {\Pr [\mathcal {A}(\mathcal {D}_{1})\in O]}\leq \exp (\varepsilon)\times {\Pr [\mathcal {A}(\mathcal {D}_{2})\in O]},\tag{1}\end{equation*}
From Definition 2, we conclude that the privacy guarantee provided by an
A typical mechanism for answering statistical queries under differential privacy is the Laplace mechanism, which adds random noise drawn from the Laplace distribution to the results of statistical queries. The magnitude of the noise is scaled according to the (global) sensitivity of the query function, which is a measure of the maximum possible change to query answers over any two neighboring databases.
Definition 3 (Sensitivity):
Let \begin{equation*} \sigma _{f}=\max _{\mathcal {D}_{1}\sim \mathcal {D}_{2}}{\|f(\mathcal {D}_{1})-f(\mathcal {D}_{2})\|_{1}},\tag{2}\end{equation*}
Given an input database \begin{equation*} \mathcal {A}(\mathcal {D})=f(\mathcal {D})+\mathrm {Lap}(\sigma _{f}/\varepsilon),\tag{3}\end{equation*}
One of the basic properties of differential privacy is compositionality, which means that any sequence of differential privacy computations is also differentially private. In general, there are two different types of compositionality: sequential composition and parallel composition [26].
Theorem 1 (Sequential and Parallel Compositions[26]):
Let
From Theorem 1, we conclude that the privacy guarantee degrades when multiple differentially private algorithms operate on the same input. If the inputs of the differentially private algorithms are disjoint, then the total privacy guarantee will be as strong as the weakest differentially private algorithm.
B. Personalized Differential Privacy
Personalized differential privacy (PDP) [4], [11], [14] is a new notion of differential privacy that provides user-level non-uniform privacy guarantees. The main goal of PDP is to enable data owners to adjust the level of privacy protection for different users based on how much privacy guarantee they desire. To achieve this goal, PDP allocates a personal privacy budget to each user (and thus to all data records of that user) independent of others. This is in contrast to traditional differential privacy in which a single global privacy budget (namely,
Definition 4 (Personal Privacy Budget Allocation):
Let
In the real world, it may be difficult for users to directly quantify their privacy protection requirements; therefore, we assume that there is a limited set of user-friendly totally ordered privacy descriptors (e.g., Low, Medium, High, and Critical), where each user chooses to represent his/her privacy protection requirement. The data owner then allocates a personal privacy budget to each user and, thus, to all data records of that user, inversely proportional to his/her privacy descriptor. Formally, let \begin{equation*} \epsilon (R)=\dfrac {\varepsilon }{\omega (p(R))},\tag{4}\end{equation*}
Definition 5 (\epsilon
-PDP):
Let \begin{equation*} {\Pr [\mathcal {A}(\mathcal {D}_{1})\in O]}\leq \exp (\epsilon (R))\times {\Pr [\mathcal {A}(\mathcal {D}_{2})\in O]},\tag{5}\end{equation*}
C. Personalized Moving Objects Database
A spatiotemporal database is a database that deals with real-world applications in which spatial changes occur over time. Moving objects databases are specific cases of spatiotemporal databases that represent and manage changes related to the movement of moving objects such as people, animals, and vehicles.
Given a set
Definition 6 (Subtrajectory):
A trajectory
A moving objects database may contain attributes other than spatiotemporal attributes, which are associated to trajectories. Some of these attributes may be sensitive, such as disease or salary, which need to be protected from unauthorized disclosure. Also, the data owner may assign a privacy descriptor to each trajectory of a moving objects database, proportional to the privacy protection requirement of the moving object that generated that trajectory. We refer to such a moving objects database as a personalized moving objects database.
Definition 7 (Personalized Moving Objects Database):
A personalized moving objects database \begin{align*} R=\langle X_{1},X_{2}, {\dots },X_{m}\rangle: S_{1},S_{2}, {\dots },S_{i}:A_{1},A_{2}, {\dots },A_{j}:P, \\\tag{6}\end{align*}
For convenience, we assume that each personalized moving objects database
The values of a sensitive attribute are usually divided into different categories. We can use a taxonomy tree [13] to categorize these values and, thus, to generalize them based on their categories. For example, Fig. 1 shows a simple taxonomy tree for an arbitrary sensitive attribute with five distinct values, which are represented by the leaf nodes of the taxonomy tree. Each non-leaf node of the taxonomy tree is uniquely labeled with a name showing the category of sensitive attribute values in the subtree rooted at that node.
Definition 8 (Taxonomy Tree):
Let
The sensitive attribute value of each trajectory data record is generalized with respect to its corresponding privacy descriptor. Let us assign a privacy descriptor in descending order to each level of the taxonomy tree, starting from the root. The generalized sensitive attribute value of each trajectory data record is then considered to be the label of a node with the same privacy descriptor in the taxonomy tree, covering the sensitive attribute value of that trajectory data record. For example, Table 1 shows a personalized moving objects database, where the sensitive attribute values of trajectory data records have been generalized with respect to the taxonomy tree of Fig. 1.
Count queries are one of the most popular queries in moving objects databases. They are the building block of many advanced data analysis tasks such as frequent pattern mining. In this paper, we define a special type of count queries, namely, sensitive count queries, which are customized for moving objects databases with sensitive attributes. With accurate answers to sensitive count queries, data recipients can obtain answers to many of their questions such as “how many trajectories with a particular sensitive attribute value share the same subtrajectory?”. In the following, we give a formal definition of a sensitive count query.
Definition 9 (Sensitive Count Query):
Let
Example 1:
Consider the personalized moving objects database of Table 1. With respect to the taxonomy tree of Fig. 1, a possible sensitive count query over this moving objects database may be
Personalized Sensitive Attribute Generalization Tree
In order to preserve the privacy of moving objects and meet the goals of personalized privacy, we define a special taxonomy tree, called a personalized sensitive attribute generalization tree (PSAGT), for each possible trajectory. Each PSAGT keeps the number of moving objects in the underlying personalized moving objects database (with generalized sensitive attribute values) that have moved along a particular trajectory. In more details, the counting function for each node of a PSAGT is computed by considering only those trajectory data records whose generalized sensitive attribute values are covered by the label of that node.
Definition 10 (Personalized Sensitive Attribute Generalization Tree):
Let
Similar to the taxonomy tree in Subsection III-C, a privacy descriptor (in descending order) is assigned to each level of a PSAGT (and thus to all nodes at that level), starting from the root (which is assumed to be at level 0).
In a particular case that no generalization is applied to the sensitive attribute values of trajectory data records, the counting function for each node of a PSAGT will be computed by considering only those trajectory data records whose sensitive attribute values are covered by the label of that node. In this case, we refer to such a PSAGT simply as an SAGT.
Example 2:
Consider the personalized moving objects database of Table 1. Fig. 2 shows the PSAGT for the trajectory
PDP-SAG
In this section, we introduce PDP-SAG, a differentially private mechanism for personalized moving objects databases that combines the sensitive attribute generalization with personalized privacy in a unified manner to provide different levels of privacy protection for different moving objects. Generally, PDP-SAG consists of the following three main steps: noisy trajectory tree construction, noisy PSAGTs assignment, and post-processing. By the first step, we aim to represent a personalized moving objects database with a differentially private noisy trajectory tree. The aim of the second step is to answer sensitive count queries with personalized differential privacy guarantees. We achieve this aim by assigning a noisy PSAGT to each node of the noisy trajectory tree. Finally, in the third step, we enforce some intra- and inter-consistency constraints on the noisy trajectory tree and its associated noisy PSAGTs to make sensitive query answers consistent with each other. Table 2 summarizes the notations used throughout the paper.
A. Noisy Trajectory Tree Construction
In this step, we construct a differentially private noisy trajectory tree, or simply a noisy trajectory tree, which is used in subsequent steps to answer all sensitive count queries under personalized differential privacy guarantees.
Definition 11 (Noisy Trajectory Tree):
Let
We construct a noisy trajectory tree similar to prior work [3], [11]. However, here we consider the global privacy budget
Example 3:
Consider the personalized moving objects database of Table 1. Fig. 3 shows a simple noisy trajectory tree (with height 2) constructed over this database. For each node, its associated trajectory and real count are placed inside, and its noisy count is placed outside the rectangle representing that node. Note that
B. Noisy PSAGTs Assignment
In this step, we construct a PSAGT over the trajectory of each node of the noisy trajectory tree \begin{align*} \epsilon (u) \!=\! \begin{cases} \epsilon (v) & \text {if}\; u {\;\text {is the root}}, \\ \left ({\dfrac {1}{\omega (p(u))}-\dfrac {1}{\omega (p(\vartheta (u)))}}\right)\times \epsilon (v) & \text {otherwise}, \\ \end{cases}\!\!\! \\\tag{7}\end{align*}
Theorem 2 (\epsilon
-PDP):
PDP-SAG satisfies
Proof:
Let
From Definition 10, we know that when the real count of a node \begin{align*} \sum _{u\in \pi ^{*}}\epsilon (u)\!= \! \epsilon (v)\!+\!\!\sum _{u\in \pi ^{*}\setminus \{r(\tilde {\Psi }_{v})\}}\!\!\left ({\dfrac {1}{\omega (p(u))}\!-\!\dfrac {1}{\omega (p(\vartheta (u)))}}\right) \!\times \!\epsilon (v), \!\!\!\!\!\!\! \\\tag{8}\end{align*}
\begin{equation*} \sum _{u\in \pi ^{*}}\epsilon (u)=\dfrac {\epsilon (v)}{\omega (p(R))}.\tag{9}\end{equation*}
On the other hand, from Subsection V-A, we know that \begin{align*} \sum _{v\in \Pi ^{*}}\sum _{u\in \pi ^{*}}\epsilon (u)=&\sum _{v\in \Pi ^{*}}\dfrac {\epsilon (v)}{\omega (p(R))} \\=&\dfrac {1}{\omega (p(R))}\sum _{v\in \Pi ^{*}}\epsilon (v)\leq \dfrac {\varepsilon }{\omega (p(R))}.\tag{10}\end{align*}
This proof shows that PDP-SAG satisfies
Example 4:
Suppose we assign the weights \begin{align*} \epsilon (u_{1})=&\dfrac {\varepsilon }{5}, \\ \epsilon (u_{2})=&\left ({\dfrac {1}{1/4}-\dfrac {1}{1}}\right)\times \dfrac {\varepsilon }{5}=\dfrac {3\varepsilon }{5},\\ \epsilon (u_{3})=&\left ({\dfrac {1}{1/8}-\dfrac {1}{1/4}}\right)\times \dfrac {\varepsilon }{5}=\dfrac {4\varepsilon }{5}.\end{align*}
We define a number of relationships, namely, stepchild and stepparent, to relate nodes in two different noisy PSAGTs whose associated hyper-roots have a child-parent relationship in
Definition 12 (Stepchild and Stepparent):
Let
Example 5:
Consider the noisy trajectory tree of Fig. 3. Fig. 4 shows the associated noisy PSAGTs of the nodes
C. Post-Processing
To make sensitive query answers consistent with each other, the associated noisy PSAGTs of the noisy trajectory tree
Note that enforcing the inter-consistency constraints makes
It is emphasized that intra- and inter-consistency constraints enforcement steps are necessary post-processing steps whose main goal is to make sensitive query answers consistent with each other after applying personalization and adding noise. In the following, we describe each of these steps in detail.
1) Intra-Consistency Constraints Enforcement
We perform this step in a way that the intra-consistency constraints are satisfied within each noisy PSAGT, while the initial consistent noisy node counts of the noisy PSAGT have minimum total distance from their original noisy ones. To do so, we solve a constrained optimization problem for each noisy PSAGT to obtain the initial consistent noisy counts of its nodes. Formally, let \begin{equation*} \mathop {\mathrm {minimize}} {\sum _{u\in V(\tilde {\Psi }_{v})}{\dfrac {{\left ({\bar {c}(u)-\tilde {c}(u)}\right)}^{2}}{\Omega (u)}}},\tag{11}\end{equation*}
\begin{equation*} \bar {c}(u)=\sum _{w\in \Theta (u)}{\bar {c}(w)},\tag{12}\end{equation*}
By solving the optimization problem in (11), we obtain the following recurrence relation:\begin{equation*} \bar {c}(u)=z(u)-s(u)\sum _{w\in A(u)}\dfrac {\bar {c}(w)}{\Omega (w)},\tag{13}\end{equation*}
\begin{align*} z(u)=&\begin{cases} \xi (u)\times \Omega (u) & \text {if}\; u {\;\text {is a leaf node}}, \\ \dfrac {\displaystyle \Omega (u)\times \sum \nolimits _{w\in \Theta (u)} z(w)}{\displaystyle \Omega (u)+\sum \nolimits _{w\in \Theta (u)} s(w)} & \text {otherwise}, \end{cases} \tag{14}\\ s(u)=&\begin{cases} \Omega (u) & \text {if}\; u {\;\text {is a leaf node}}, \\ \dfrac {\displaystyle \Omega (u)\times \sum \nolimits _{w\in \Theta (u)} s(w)}{\displaystyle \Omega (u)+\sum \nolimits _{w\in \Theta (u)} s(w)} & \text {otherwise}, \end{cases}\tag{15}\end{align*}
\begin{equation*} \xi (u) = \begin{cases} \dfrac {\tilde {c}(u)}{\Omega (u)} & \text {if}\; u {\;\text {is the root}},\\ \xi (\vartheta (u))+\dfrac {\tilde {c}(u)}{\Omega (u)} & \text {otherwise}. \end{cases}\tag{16}\end{equation*}
2) Inter-Consistency Constraints Enforcement
We perform this step in a way that the inter-consistency constraints are satisfied among the noisy PSAGT of each internal node of \begin{equation*} \bar {\bar {c}}(u)=\sum _{w\in \Phi (u)}{\bar {\bar {c}}(w)},\tag{17}\end{equation*}
Let us assume that the root of \begin{align*} \bar {\bar {c}}(u) = \begin{cases} \dfrac {\bar {c}(u)}{\displaystyle \sum \nolimits _{w\in \Phi (\varphi (u))}\bar {c}(w)}\times \bar {\bar {c}}(\varphi (u)) & \text {if}\; u {\;\text {is a leaf node}}, \\ \displaystyle \sum \nolimits _{w\in \Theta (u)}{\bar {\bar {c}}(w)} & \text {otherwise}, \end{cases} \\\tag{18}\end{align*}
Algorithm 1 shows the pseudo-code of the post-processing step that takes a noisy trajectory tree
Algorithm 1 Post-Processing
for each non-root node
for each node
Compute the temporary value
end for
for each node
Compute the temporary values
end for
for each node
Compute the initial consistent noisy count
end for
end for
for each internal node
for each node
if the parent of
Set the final consistent noisy count
else
Compute the final consistent noisy count
end if
end for
end for
Example 6:
Suppose that the noisy PSAGTs in Fig. 5a are obtained by enforcing intra-consistency constraints to the noisy PSAGTs of Fig. 4. Fig. 5b shows the final consistent noisy PSAGTs, where the final consistent noisy count of each node is obtained from (18).
The consistent versions of the noisy PSAGTs of Fig. 4: (a) the intra-consistent noisy PSAGTs, (b) the final consistent noisy PSAGTs.
Lemma 1:
The final consistent noisy counts of nodes of all mature noisy PSAGTs satisfy the inter-consistency constraints.
Proof:
Suppose \begin{align*} \sum _{w\in \Phi (u)}{\bar {\bar {c}}(w)}=&\sum _{w\in \Phi (u)}\dfrac {\bar {c}(w)}{\sum _{w'\in \Phi (\varphi (w))}{\bar {c}(w')}}\times {\bar {\bar {c}}(\varphi (w))} \\=&\dfrac {\sum _{w\in \Phi (u)}{\bar {c}(w)}}{\sum _{w'\in \Phi (u)}{\bar {c}(w')}}\times \bar {\bar {c}}(u) \\=&\bar {\bar {c}}(u).\tag{19}\end{align*}
Now, let us assume that \begin{equation*} \bar {\bar {c}}(u)=\sum _{w\in \Theta (u)}{\bar {\bar {c}}(w)}.\tag{20}\end{equation*}
By the induction hypothesis, for each child \begin{equation*} \bar {\bar {c}}(w)=\sum _{w'\in \Phi (w)}{\bar {\bar {c}}(w')}.\tag{21}\end{equation*}
By substituting (21) into (20), we obtain \begin{align*} \bar {\bar {c}}(u)=&\sum _{w\in \Theta (u)}\sum _{w'\in \Phi (w)}{\bar {\bar {c}}(w')} \\=&\sum _{w\in \Phi (u)}\sum _{w'\in \Theta (w)}{\bar {\bar {c}}(w')} \\=&\sum _{w\in \Phi (u)}{\bar {\bar {c}}(w)}.\tag{22}\end{align*}
Experiments
In this section, we describe the experiments performed to evaluate the performance of PDP-SAG in terms of relative error and privacy breach. Our experiments were conducted in Python and run on a PC with an Intel Core i7 3.6 GHz CPU and 16 GB RAM.
A. Experimental Setup
We performed our experiments using the following synthetic and real moving objects datasets:
City80K dataset [12]. This dataset simulates the routes of 80,000 citizens in a metropolitan area with 26 city blocks in 24 hours.
Geolife dataset [27]. This dataset collects the GPS trajectories of 182 users in Beijing, China, during a period of over five years. We choose the trajectories whose points (latitude-longitude coordinates) are between [39.4, 40.8] latitude and [115.8, 117.4] longitude, and break a trajectory if the interval time between its subsequent and current points is larger than one minute. In this way, we obtain approximately one million trajectories. Since each recorded trajectory consists of a sequence of points in a continuous spatial domain, we discretize the spatial domain by partitioning it into a
grid and then consider each grid cell as a location.64\times 64 Taxi dataset (http://sensor.ee.tsinghua.edu.cn). This dataset contains approximately 5.2 million trajectories of 8602 taxi cabs in Beijing, China. The trajectory data cover a region of Beijing restricted between [39.8, 40.1] latitude and [116.1, 116.6] longitude. Similar to the Geolife dataset, we discretize its spatial domain into a finite number of grid cells and then consider each grid cell as a location. However, due to a smaller region size, we consider a
grid.16\times 16
Similar to prior work [3], [11], [28], we set the maximum trajectory length
B. Evaluation Metrics
We used two different metrics, namely, relative error and privacy breach, to evaluate how PDP-SAG makes a trade-off between utility and privacy.
1) Relative Error
This metric measures the quality of the noisy answer to each sensitive count query by considering its relative error with respect to the true answer. Formally, let \begin{equation*} \mathcal {E}(Q)=\dfrac {|\tilde {c}_{\mathcal {D}}(Q)-c_{\mathcal {D}}(Q)|}{\max {\left \lbrace{ c_{\mathcal {D}}(Q),\delta }\right \rbrace }}\times 100,\tag{23}\end{equation*}
2) Privacy Breach
This metric measures the privacy protection provided for each trajectory data record by considering both personal privacy budget allocated to its corresponding moving object and the amount of generalization applied to its sensitive attribute value. The lower the amount of personal privacy budget or the more the generalization of the sensitive attribute value, the lower the privacy breach of a trajectory data record will be. Formally, given a personalized moving objects database \begin{equation*} \mathcal {B}(R\mid \tilde {\Upsilon })=\dfrac {\tanh (\epsilon (R))}{|g(R)|}\times 100,\tag{24}\end{equation*}
C. Experimental Results
We cannot directly compare PDP-SAG with traditional differentially private mechanisms, because none of them consider moving objects databases that have sensitive attributes other than spatiotemporal attributes and, as we said before, PDP-SAG is the first mechanism that combines personalized differential privacy with sensitive attribute generalization in moving objects databases. For this reason, we performed two different sets of experiments.
In the first set of experiments, we studied mainly the impact of personalization and generalization on the average relative error and average privacy breach of PDP-SAG. To do this, we implemented a simplified version of PDP-SAG, called DP-SA, by assuming that all moving objects have the same privacy protection requirement and the sensitive attribute generalization is not permitted. Actually, DP-SA is similar to PDP-SAG, except that it allocates the same privacy budget
To compare the average relative error of PDP-SAG with that of DP-SA, we constructed a number of sensitive count query sets on each moving objects dataset, each of which having different maximum query size (namely, 1, 2, 3, 4, and 5 for the City80K dataset; as well as, 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20 for the Geolife and Taxi datasets) and containing 10,000 randomly-generated sensitive count queries. Each location in a sensitive count query was uniformly drawn from the set of locations of the underlying dataset. Furthermore, we randomly selected a subset of sensitive attribute values for each sensitive count query with respect to the constructed taxonomy tree. Also, similar to prior work [3], we fixed the maximum height of the noisy trajectory tree to 5. To answer queries of size larger than 5, we iteratively joined all joinable subtrajectories [3]. Figs. 6a to 6l show the obtained results (on a logarithmic scale) for different sensitive count query sets on City80K, Geolife, and Taxi under different global privacy budgets. From the figures, we observe that the average relative error of PDP-SAG is substantially smaller than that of DP-SA. This is because by allocating more personal privacy budget to trajectory data records whose corresponding moving objects have lower privacy protection requirements, PDP-SAG can achieve smaller relative error than DP-SA in which all trajectory data records have the same privacy budget. Therefore, we conclude that DP-SA degrades the quality of sensitive query answers by applying excessive privacy protection to moving objects with low privacy protection requirements. Moreover, as expected, the average relative error of both PDP-SAG and DP-SA reduces when increasing the global privacy budget, because of the less amount of noise is added to sensitive query answers. Also, the results show that the average relative error reduces by increasing the maximum query size, because of increasing the number of queries that do not exist in the original moving objects database.
Comparison of the average relative error (in percent) of PDP-SAG to that of DP-SA for different sensitive count query sets: (a) City80K,
We also compared the average privacy breaches of trajectory data records (moving objects) with different privacy descriptors in PDP-SAG with the privacy breach of each trajectory data record (moving object) in DP-SA. Our reason for this comparison was that all trajectory data records in DP-SA have the same privacy breach, while different trajectory data records in PDP-SAG may have different privacy breaches, depending on their privacy descriptor. Table 3 reports the obtained results for City80K, Geolife, and Taxi under different global privacy budgets. From the table, we observe that PDP-SAG provides lower privacy breach than DP-SA for trajectory data records with higher privacy descriptors (namely, “High” and “Critical”) and, therefore, satisfies stronger privacy protection for their corresponding moving objects. This is due to the fact that PDP-SAG applies a high amount of generalization to the sensitive attribute of these trajectory data records. This is while DP-SA applies no generalization. Furthermore, as we can see, the average privacy breach of trajectory data records with a particular privacy descriptor in PDP-SAG is almost the same for different datasets. The reason is that, as we said before, the distribution of privacy descriptors in all these datasets follows a Zipf distribution with the same shape parameter. Therefore, the percentage of trajectory data records having a particular privacy descriptor is almost the same in different datasets, which makes those trajectory data records have roughly the same average privacy breach.
To study how PDP-SAG reacts to increasing or decreasing the number of highly privacy-conscious moving objects, we also considered two other values of
The average privacy breach of trajectory data records in City80K for different distributions of privacy descriptors under different global privacy budgets. Almost the same results hold for trajectory data records in Geolife and Taxi.
In the second set of experiments, we considered simple count queries instead of sensitive ones. A simple count query is a sensitive count query that does not contain any sensitive attribute value. It is obvious that PDP-SAG can answer to such count queries through the root of noisy PSAGTs. Considering simple count queries allowed us to fairly compare the average relative error of PDP-SAG with the algorithm proposed in [3], referred to as N-gram, which is the most popular differentially private algorithm to answer simple count queries over sequential databases including moving objects databases. Table 5 reports the obtained results for different simple count query sets on City80K, Geolife, and Taxi under different global privacy budgets. From the table, we observe that the average relative error of PDP-SAG is better than or as good as N-gram. However, by presenting PDP-SAG, our main goal is to be able to answer sensitive count queries that are issued on a personalized moving objects database in a personalized differentially private manner, but this experiment shows that PDP-SAG also has good results for simple count queries, in comparison to traditional differentially private algorithms.
Conclusion and Discussion
Preserving privacy for moving objects databases that contain sensitive attributes other than spatiotemporal attributes is an issue of great interest and concern in recent years. In this paper, we have presented a novel mechanism, called PDP-SAG, that addresses this issue in the contexts of personalization and the strong privacy model of differential privacy. PDP-SAG generalizes the sensitive attribute value of each trajectory data record based on the privacy protection requirement of its moving object and then answer sensitive count queries in a personalized (non-uniform) differentially private manner.
PDP-SAG addresses two main drawbacks of existing differentially private mechanisms for moving objects databases: (1) lack of attention to the privacy protection requirements of moving objects when allocating privacy budgets to them, and (2) lack of usability for moving objects databases that contain sensitive attributes other than spatiotemporal attributes. To address these drawbacks, PDP-SAG generalizes the sensitive attribute values of trajectory data records (moving objects) based on their privacy descriptor and allocates more personal privacy budget to trajectory data records (moving objects) with lower privacy descriptors to adjust excessive privacy protection. It then constructs some noisy PSAGTs to keep different noisy frequencies for each trajectory according to the generalized sensitive attribute values of trajectory data records in which that trajectory is occurred. Also, PDP-SAG enforces some intra- and inter-consistency constraints to noisy PSAGTs to make each noisy PSAGT internally consistent and different noisy PSAGTs externally consistent with each other.
Extensive experiments reveal that PDP-SAG improves the utility of sensitive query answers and satisfies stronger privacy protection for moving objects with higher privacy protection requirements, in comparison to a simplified version, also known as DP-SA, in which no personalization and generalization are permitted.
As future work, we plan to develop sensitive attribute generalization for other types of databases and other types of data mining tasks while achieving personalized differential privacy.