Journals & Magazines >IEEE Access >Volume: 7

Efficient Group Proof of Storage With Malicious-Member Distinction and Revocation

The challenge and verification.

Abstract:

Proof of Storage (POS) is a system utilized by a client to verify whether the original data is intact while being possessed by an untrusted server. In a grouping applicat...Show More

Metadata

Abstract:

Proof of Storage (POS) is a system utilized by a client to verify whether the original data is intact while being possessed by an untrusted server. In a grouping application, multiple members share and verify the same file, and the group manager is responsible for determining if the data has been manipulated based on the responses from group members. However, a malicious member may repudiate a correct proof; therefore, it is important to distinguish the honest members from malicious ones. To the best of our knowledge, none of the existing group-oriented schemes have solved this challenge efficiently and up to the desired satisfaction. In this paper, based on matrix calculation, pseudo-random functions, and commitment functions, we propose a new Group Proof of Storage with Malicious-Member Distinction and Revocation scheme (DR-GPOS). Specifically, in terms of functionality, DR-GPOS can distinguish and revoke the malicious members, as well as, guarantee the integrity and deduplication of the outsourced data. From a security perspective, DR-GPOS can also resist against selective attacks and the collusion attacks from the revoked members (e.g. forging proofs by colluding with the server). The security properties of the proposed schemes have also been formally proven in a standard model. We have further implemented it in a real-world (Baidu) cloud server, to evaluate the performance with large scale data (> 10 G).

The challenge and verification.

Published in: IEEE Access ( Volume: 7)

Page(s): 75476 - 75489

Date of Publication: 20 May 2019

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2019.2917919

Funding Agency:

Contents

SECTION I.

Introduction

Cloud computing [1]–[3] in recent years has provided a low-cost and scalable services for clients and has been widely used in industrial and academic communities. The most fundamental and popular service is the cloud storage, which has been widely used, and provided by many vendors (e.g. Google Drive, Amazon S3, Dropbox, etc.).

However, there still exist a number of security issues, which lead to data loss in cloud storage. For instance, T-Mobile¹ suffered phone sales hit by sidekick loss. Afterwards, Google Gmail² has encountered large-scale data losses in 2011. Hence, these accidental or malicious breaches incur huge losses for the clients as well as the enterprises. Therefore, data integrity of cloud storage verification has become an important research issue.

In order to solve aforementioned challenge, a series of proof of storage (POS) schemes [4]–[7] have been proposed. In some actual applications, multiple users jointly manage and verify the same data. For instance, data of some internet companies (e.g. purchase records and browsing history) is exploding every day. Once the data is damaged, the company suffers serious losses. To solve this problem, some professional verifiers are required to check and preserve the data regularly. These verifiers can be considered as members of a group who manage and share the same data. The group manager concludes if the data is intact when being possessed based on the group members’ responses.

If malicious members exist in the group, the malicious members should be identified and revoked. In a real world, malicious members may also collude with the cloud server to forge valid proofs. Therefore, if different verification results are returned from group members, the dishonest one should be revoked.

Public verification schemes can avoid this trouble, since anyone can verify the data integrity. However, public schemes cannot be efficiently executed since they are generally constructed based on the bilinear pairing or third-party verification.

Therefore, an efficient POS system needs to be established to meet the following requirements:

Achieve data integrity in group applications: The scheme should prove the data possession efficiently and accurately in a group environment. Each group member can verify the proof independently.
Distinguish malicious members from honest ones: If different verification results are returned, the scheme can exactly distinguish the dishonest member from honest members.
Revoke malicious members: When a malicious member is identified, the scheme can revoke its access, which means that the secret key of a revoked member cannot be used to forge a valid tag or proof, even if the malicious member colludes with the server. Additionally, the revoked member cannot derive any secret key of other members.
Resist selective opening attacks: In the group, multiple members with different keys correspond to the same verification tag. For security reasons, if the secret keys of some members are leaked, the adversary (possibly a cloud server or a malicious third party) cannot use the information to forge a legitimate tag or proof.
Support data deduplication: For existing work, each member holds its own metadata (e.g. verification tags). If all the members with different secret keys share the same metadata (i.e. deduplication is achieved), the storage costs will be reduced. That is, if deduplication is captured, the metadata of the system should be only related to the total data size, and has no relation with the number of members.

Wang et al. proposed a GPDP scheme [8] to guarantee the properties of data deduplication, data integrity and resisting selective attacks, i.e. the above requirements 1, 4 and 5. In order to achieve all of the above properties, we will present a new DR-GPOS scheme which is based on matrix calculation, pseudo-random functions, and commitment functions. The four-fold concrete contributions of this paper are summarized as follows:

In terms of functionality, our proposed scheme can accurately distinguish and securely revoke the malicious member, as well as guarantee the integrity and deduplication of the outsourced data.
From a security perspective, our proposed scheme can resist against the selective attacks and the collusion attacks from revoked members (e.g. forging proof by colluding with server).
To verify its security, we define three security games to capture properties of member distinction, member revocation, and proof of storage respectively, and then give the formal security analysis in the standard model.
In order to evaluate the performance of the scheme, we implement a prototype of DR-GPOS scheme on Baidu cloud server, and utilize data size of 10G to measure the efficiency of DR-GPOS.

The rest of this paper is organized as follows: We give the overview of our idea in section 2, and the preliminaries in section 3. Then in section 4, the definition and concrete construction of DR-GPOS are introduced in detail. We further give the security analysis and performance evaluation in section 5 and section 6, respectively. Finally, the related work is introduced in section 7 and the conclusion is given in section 8.

SECTION II.

Overview of Our Idea

DR-GPOS scheme consists of four phases, as shown in Fig.1: a) preprocessing, b) challenge and verification, c) malicious-member distinction, and d) malicious-member access revocation.

FIGURE 1.

The DR-GPOS scheme. (a) The pre-process (b) The challenge and verification (c) The malicious-member distinction (d) The malicious-member revocation.

Show All

In the pre-process phase, the trusted third party (TTP) decomposes the file and generates the verification tags (Tags), commitment functions (COMs), and the processed data (File). Afterwards, TTP forwards the data to the server, distributes the secret keys to each group member, and deletes all local file data.

In the challenge and verification phase, each member sends an independent challenge $chal$ to the server, against which the server generates the proofs and returns it respectively to the members. The returned proofs are verified for their validity, by using the local metadata at each member location.

The third phase has two sub-cases:

Case 1:

If all members accept their corresponding proofs returned by the server, then the server can guarantee the possession of the group’s data at given time.

Case 2:

In the scenario, where some members do not accept the returned proofs, then it can conclude that either some members are being dishonest, or the server has been compromised. In this situation, the server is required to open the commitment functions to prove its innocence and distinguish the dishonest members. The commitment functions can be verified by anyone, and the dishonest entities can be sought out publicly.

In the phase of malicious-member revocation, the dishonest members should be revoked. Note that the revoked members cannot use their keys to forge legal proofs and pass the verification as legitimate members, even if they collude with the server. In addition, the revoked keys cannot be used to derive any other valid secret keys.

SECTION III.

Preliminaries

In this section, we introduce homomorphic Macs, cross-authentication codes, commitment function, and the security assumptions, which serve as a basis of the proposed scheme.

A. Homomorphic Macs

The definition of Homomorphic Message Authentication Code (HomMac) was initially proposed by Agrawal and Boneh [9]. We utilize boldfaced $\boldsymbol {v}$ to represent a vector.

Definition 1:

A HomMac consists of four polynomial-time algorithms (Gen, Sign, Combine, Verify) as follows:

${Gen}(1^{\lambda })\rightarrow (pk,sk)$ is a probabilistic algorithm, which creates the public-private key pair $(pk,sk)$ , based on the security parameter input $\lambda$ .
${Sign}(sk, id, \boldsymbol {v}_{i}, i) \rightarrow t_{i}$ is an algorithm to generate a tag of each vector. It takes secret key $sk$ , the vector space identifier $id$ , vector $v_{i} \in F_{q}^{s}$ ( $s$ is the length of vector), and its index $i$ as input, and returns the tag $t_{i}$ of $\boldsymbol {v}_{i}$ .
${Combine}((\boldsymbol {v}_{1}, t_{1}, \alpha _{1}),\ldots,(\boldsymbol {v}_{c}, t_{c}, \alpha _{c}),pk)\rightarrow (T, y)$ is an algorithm to generate a homomorphic Mac. It takes public key $pk$ , $c$ vectors $\boldsymbol {v}_{1},\ldots ,\boldsymbol {v}_{c} \in F_{q}^{s}$ , the related tags $t_{1},\ldots t_{c}$ , and $c$ random constants $\alpha _{1},\ldots ,\alpha _{c} \in F_{q}$ as input. Then it computes $\boldsymbol {y} = \sum _{i=1}^{c} {\alpha _{i} \boldsymbol {v}_{i}} \in F_{q}^{s}$ and the corresponding tag $T$ , and returns $T$ and $\boldsymbol {y}$ .
${Verif}(pk, sk, id, \boldsymbol {y}, T) \rightarrow \{1, 0\}$ is a deterministic algorithm to verify a homomorphic Mac. It takes the key pair $(pk, sk)$ , the vector space identifier $id$ , vector $\boldsymbol {y}\in F_{q}^{s}$ , and the corresponding $T$ as input. If the verification is accepted, 1 is returned, otherwise 0.

B. Cross-Authentication Codes

1) Definition of Cross-Authentication Codes

The definition of Cross-Authentication Codes was given by Fehr et al. [10].

Definition 2:

( ${L}$ -Cross-Authentication Codes, short: L-XAC). Let XK be the key space and XT be the tag space. For $L \in N$ : A ${L}$ -XAC consists of three polynomial-time algorithms (XGen, XAuth and XVer) as follows:

The returned value of {0, 1} shows if the input $T$ is the correct tag (generated by $chal$ ), for one of the secret keys $K_{i}$ .

${XGen}(1^{k}) \rightarrow K$ is a probabilistic algorithm to generate a uniform random key. Based on the security parameter $k$ , the secret key $K$ is returned.
${XAuth}(K_{1},K_{2},\ldots ,K_{L})\rightarrow T$ is an algorithm to generate the tags for verification. A set of keys $K_{1},K_{2},\ldots ,K_{L}$ are given as input, and the corresponding tag $T \in \textrm {XK}$ is returned.
${XVer}(K_{i},T) \rightarrow \{1,0\}$ is an algorithm to verify a tag. The returned value of {0, 1} shows if the input $T$ is the correct tag (generated by $chal$ ), for one of the secret keys $K_{i}$ .

The following properties are required:

Correctness:
For all $i \in [L]$ , the probability $fail_{XAC}(k)$ is negligible, where $\begin{align*}&\hspace {-2pc}fail_{XAC}(k) \\:=&\textrm {Pr}[{XVer}(K_{i}, i, {XAuth}(K_{1},K_{2},\ldots ,K_{L})) \neq 1].\end{align*}$ View Source
Security Against Impersonation Attacks:
Intuitively, when the key $K$ is not the generation key of the tag $T'$ , even if the whole tag space is accessed, it is difficult to find a $T'$ which can be verified by $K$ . The probability of $Adv_{XAC}^{imp}(k)$ is defined as: $\begin{align*} Adv_{XAC}^{imp}(k):=&\max \limits _{i,T'} \textrm {Pr}[{XVer}(K,i,T') \\=&1 | K \leftarrow {XGen}(1^{k})]\leq negl(k)\end{align*}$ View Source where the max is applied over all $i \in [L]$ and $T' \in \textrm {XT}$ .
Security Against Substitution Attacks:
Intuitively, it is easy to verify the correctness of the tag $T$ with the key $K_{i}$ . But in the absence of $K_{i}$ , even if the adversary has a correct tag $T$ and all other keys $K_{\neq i}$ , it is difficult to forge a legitimate label $T'$ which can be verified by $K_{i}$ . The probability of $Adv_{XAC}^{sub}(k)$ is defined as $\begin{equation*} \max \limits _{i,K_{\neq i}, F} \textrm {Pr} \!\left [{ \!\begin{array}{cc} \begin{smallmatrix} T' \neq T \\ \wedge \\ {XVer}(K,i,T')=1 \\ \end{smallmatrix}& \begin{smallmatrix} K_{i} \leftarrow {XGen}(1^{k}) \\ T := {XAuth}(K_{1},K_{2},\ldots ,K_{L})\\ T'\leftarrow F(T)\\ \end{smallmatrix} \end{array} \!\!}\right]\!\leq negl(k).\end{equation*}$ View Source Notice that the max is applied over all $i \in [L]$ , $K_{\neq i} = (K_{j})_{j \neq i} \in \textrm {XK}$ , and all possible functions $F: \textrm {XT} \rightarrow \textrm {XT}$ .

2) An Example of L-XAC

Let $\textrm {XK} = F^{2}$ and $\textrm {XT} = F^{L} \cup \{ \perp \}$ , where $F$ is a finite field of size $q$ that depends on security parameter $k$ (e.g. $q = 2^{k}$ ).

XGen: generates a set of random keys $K_{1} = (a_{1},b_{1}),\ldots ,K_{L} = (a_{L},b_{L}) \in \textrm {XK}$ .
XAuth: generates the authentication tag $T = (T_{0},T_{1},\ldots ,T_{L-1}) \in F^{L}$ such that $p_{T}(a_{i}) = b_{i}$ for $i = 1,2,\ldots ,L$ , where satisfy $p_{T}(x) = T_{0} + T_{1}x +\ldots + T_{L-1}x^{L-1} \in F[x]$ . The linear equation $AT=B$ can be used to efficiently compute the value of $T$ . It is important to note that $A \in F^{L \times L}$ is a Vandermonde matrix, and $B \in F^{L}$ is the column vector $[b_{1},b_{2},\ldots ,b_{L}]$ . Moreover, the $i-$ th row is given by $1,a_{i},a_{i}^{2},\ldots ,a_{i}^{L-1}$ .
XVer: verify the tag $T$ and output 1 if and only if $T \neq \perp$ and $p_{T}(a_{i}) = b_{i}$ , $i \in [L]$ .

The work in [10] proves that

${L}$

-XAC satisfies the conditions:

$fail_{XAC}(k) \leq \frac {L(L-1)}{2q}$

$Adv_{XAC}^{imp}(k) \leq \frac {1}{q}$

, and

$Adv_{XAC}^{sub}(k) \leq 2 \cdot \frac {L-1}{q}$

C. A Reusable Commitment Function

The non-repudiation property in PDP/POR scheme has been achieved in Wang et al.’s scheme [7] by introducing a commitment scheme, which is reusable and binding. Based on Pedersen commitment function, the reusable commitment function is as follows.

Let $p = 2p' + 1$ and $q = 2q' + 1$ be safe primes and let $N = pq$ be the modulus of an RSA scheme. $g$ is the generator of the unique cyclic subgroup whose order is $p'q'$ . Let $G_{q}$ be a group of prime order $q$ in which computing the discrete logarithm function is intractable, and let $h$ be elements of $G_{q}$ where it is hard to compute $\log _{g} h \mod q$ . Choose $e$ and $d$ , where $e$ is a small prime and $ed = 1 \bmod \phi (N)$ . A reusable commitment function ca be constructed as $\begin{equation*} COM(M) = (g^{M} h^{H(ID)})^{e} \bmod N\end{equation*}$ View Source where $H$ is a full-domain hash function. Notice that $d$ can be viewed as a public trapdoor of the commitment.

Intuitively, it is a ramification of the Pedersen commitment function [11], therefore it is a perfect commitment scheme which is against an all-powerful receiver. Above reusable commitment function has two properties: information-theoretically hiding and computational binding.

1) Information-Theoretically Hiding

The committer $COM(M)$ is a random element of $G_{q}$ , for any given a probabilistic polynomial time receiver. Hence, there is no distinction between a random element of $G_{q}$ and $COM(M)$ .
it is possible to compute $\log _{g} h \bmod q$ by an all-powerful receiver. Hence, $\log _{g} {COM} \bmod q = M + H(ID) \log _{g} h$ can be obtained. Note that, it is impossible to determine the value of $M$ with a random auxiliary value $H(ID)$ .

2) Computational Binding

For a probabilistic polynomial time sender, it is difficult to compute $\log _{g} h \bmod q$ , hence it is almost impossible to find a pair $(M',ID') \neq (M,ID)$ such that $COM(M',ID') \neq COM(M,ID)$ . Consequently, the committer $COM(M)$ can be opened only in one way to demonstrate the original value $M$ .

Remarks:

$ID$ is the unique identifier of $M$ , and the hash function $H$ is used to resist the forgery of multiple coefficient.
$d$ does not reveal the information of $e$ , thus it ensures the commitment cannot be forged, which makes it reusable.
When initiating a challenge, the committer discloses the value of $M$ and $ID$ , thus anyone can verify whether $s = t$ holds, where $s = g^{M} h^{H(ID)} \bmod N$ and $t = (COM(M))^{d} = (g^{M} h^{H(ID)})^{ed} \bmod N = g^{M} h^{H(ID)}\,\, \bmod \,\,N$ .

D. Security Assumptions

1) DLP (Discrete Logarithm Problem)

Let $G$ be a group, $g \in G$ , and $\langle g \rangle$ be the cyclic subgroup generated by $g$ . Then the discrete logarithm problem on group $G$ can be given as: Given $g \in G$ and $a \in \langle g \rangle$ , determine if there exists an integer $x$ that satisfy $g^{x} = a$ , and find such an $x$ . Notice that the problem is hard to solve in polynomial time.

2) PRF (Pseudo-Random Function)

Let $I_{k}$ denote the set of all $k$ -bit strings, and $H_{k}$ be the set of all functions from $I_{k}$ into $I_{k}$ . Then the Pseudo-Random Function $F = {F_{k}} \subseteq H_{k}$ has the following properties:

Indexing: Each function has a unique index $k$ associated with it, thus randomly picking a function $f \in F_{k}$ is easy.
Polynomial time Computation: Given an input $x$ and a function $f \in F_{k}$ , there exists a polynomial time algorithm to compute $f(x)$ .
Pseudo-Randomness: No probabilistic polynomial time algorithms can distinguish the functions in $F_{k}$ and from the functions in $H_{k}$ , as well as the value of $f(x)$ and a randomness in $I_{k}$ .

SECTION IV.

The Proposed DR-GPOS Scheme

The definition and detailed construction of DR-GPOS is given in the following section.

A. Definition of Dr-GPOS

Definition 3:

GPOS with Malicious-Member Distinction and Revocation (DR-GPOS). A DR-GPOS scheme is a collection of five polynomial-time algorithms ( ${KeyGen}$ , ${PreGen}$ , ${ProofGen}$ , ${VerifProof}$ , ${VerifCOM}$ ). Each is defined as follows:

${KeyGen}(1^{\lambda }) \rightarrow (mK, K_{l})$ is a probabilistic polynomial-time algorithm to generate keys. The input security parameter $\lambda$ is used to generate the master key $mK$ and the secret key $K_{l}$ of each member.
${PreGen}(\{K\}_{l=1}^{L}, mK, m_{i}) \rightarrow (T_{i}, COM_{i})$ is a polynomial-time algorithm which can generate the metadata required to verify the proof. It uses the private keys $\{K\}_{l=1}^{L}$ , the master key $mK$ and a file block $m_{i}$ as input, and returns the commitment function $COM_{i}$ and the verification tags $T_{i}$ .
${ProofGen}(File,chal,\Sigma) \rightarrow \rho$ is a polynomial-time algorithm to generate a proof of storage. The input consists of an ordered data collection $File$ , a challenge $chal$ and an ordered collection $\Sigma$ of tags. It returns a proof $\rho$ for the file data determined by $chal$ .
${VerifProof}(K_{l}, chal, \rho) \rightarrow \{1,0\}$ is a polynomial-time algorithm to verify a proof of storage. It takes secret key $K_{l}$ of arbitrary member, a challenge $chal$ and a proof $\rho$ as input, and returns whether $\rho$ is a correct proof for the data determined by $chal$ .
${VerifCOM}(\{M\},\{I\},c,\{COM\}) \rightarrow \{1,0\}$ is a polynomial-time algorithm to distinguish the dishonest member from honest members. It takes the chosen chunks $\{M\}$ , the corresponding indices $\{I\}$ and commitment functions $\{COM\}$ as input, and returns whether each commitment is valid.

B. Detailed Construction

The algorithm fist divides the file into $n$ chunks, such that $File=\{m_{< 1>},m_{< 2>},\ldots ,m_{< n>}\}$ , where each chunk has $L$ blocks: $m_{}=\{m_{i,1}, m_{i,2},\ldots ,m_{i,L}\}$ . Here, the commitment and tag of each chunk $m_{}$ can be give as $COM_{i}$ and $T_{}$ , where $1 \leq i \leq n$ .

Let $p = 2p' + 1$ and $q = 2q' + 1$ be safe primes and let $N = pq$ be the modulus of an RSA scheme. Let $g$ be the generator of the unique cyclic subgroup whose order is $p' q'$ . Let $F$ be a finite field of size $2^\lambda$ , then the key space $\kappa = F^{3}$ and tag space $\chi = F^{L} \cup \{\bot \}$ .

Let $H$ be a full-domain hash function and $f$ be a pseudo-random function, and let $\pi$ be a pseudo-random permutation. $\lambda$ is the security parameter. $\begin{align*}&H:\{0,1\}^{\log _{2}{(n)}}\rightarrow \{0,1\}^{\log _{2}{(n)}}.\\&f:\{0,1\}^{\lambda } \times \{0,1\}^{\log _{2}{(n)}} \rightarrow \{0,1\}^{\lambda }.\\&\pi :\{0,1\}^{\lambda } \times \{0,1\}^{\log _{2}{(n)}} \rightarrow \{0,1\}^{\log _{2}{(n)}}.\end{align*}$ View Source

${KeyGen}(1^\lambda)\rightarrow (msk,mpk,sk_{l})$ : Let $e$ and $d$ be secret primes such that $ed = 1 \bmod p'q'$ . Let $L$ be the maximum number of group members. Given the security parameter $\lambda$ , the outputs are $msk = (p,q,e)$ , $mpk = (d,N,g,h)$ and a set of secret keys: $sk_{l}=(a_{l},b_{l},c_{l})\in \kappa , 1\leq l \leq L$ .
${PreGen}(\{sk_{l}\}_{l=1}^{L}, msk, m_{}) \rightarrow (T_{}, COM_{i})$ : For each $i$ , $1 \leq i \leq n$ , compute the verification tag of each chunk $m_{}$ : $\begin{align*}&\hspace {-2.pc}\begin{bmatrix} 1,c_{1},c_{1}^{2},\ldots ,c_{1}^{L-1}\\ 1,c_{2},a_{2}^{2},\ldots ,c_{2}^{L-1}\\ \vdots \\ 1,c_{L},c_{L}^{2},\ldots ,c_{L}^{L-1} \end{bmatrix}\cdot \begin{bmatrix} m_{i,1}\\ m _{i,2}\\ \vdots \\ m _{i,L} \end{bmatrix}+ \begin{bmatrix} f_{b_{1}}(i)\\ f _{b_{2}}(i)\\ \vdots \\ f _{b_{L}}(i) \end{bmatrix}\\&\qquad \qquad = \begin{bmatrix} 1,a_{1},a_{1}^{2},\ldots ,a_{1}^{L-1}\\ 1,a_{2},a_{2}^{2},\ldots ,a_{2}^{L-1}\\ \vdots \\ 1,a_{L},a_{L}^{2},\ldots ,a_{L}^{L-1} \end{bmatrix}\cdot \begin{bmatrix} T_{i,1}\\ T _{i,2}\\ \vdots \\ T _{i,L}. \end{bmatrix}\end{align*}$ View Source The tag $T_{}=(T_{i,1}, T_{i,2}, \ldots , T_{i,L})$ can be computed by solving linear matrix equation group with $L$ unknowns. For each $i$ , $1 \leq i \leq n$ , compute the commitment function for each chunk $m_{}$ : $COM_{i} = COM(m_{}) = (g^{m_{}}h^{H(i)})^{e} \bmod N$ . Output $\{m_{}, T_{}, COM_{i}\}_{1\leq i \leq n}$ .
${ProofGen}(\{m_{}\}_{1\leq i \leq n},\{T_{}\}_{1\leq i \leq n},chal) \rightarrow \rho$ The challenge is $chal = (c, k_{1},k_{2})$ , where $c$ is the number of challenged chunks and $k_{1}, k_{2}$ are fresh random keys. For $1 \leq z \leq c :~1$ . Compute the index of each sampled block: $i_{z} = {\pi }_{k_{1}}(z)$ . 2. Compute the relevant coefficient: $v_{z} = f_{k_{2}}(z)$ . Compute: $\begin{equation*} \begin{cases} \tau _{1} = v_{1} T_{i_{1},1}+v_{2} T_{i_{2},1}+\ldots+v_{c} T_{i_{c},1}\\ \tau _{2} = v_{1} T_{i_{1},2}+v_{2} T_{i_{2},2}+\ldots+v_{c} T_{i_{c},2}\\ \qquad \qquad \qquad ~ ~ \vdots \\ \tau _{L} = v_{1} T_{i_{1},L}+v_{2} T_{i_{2},L}+\ldots+v_{c} T_{i_{c},L} \end{cases}\tag{1}\end{equation*}$ View Source and $\begin{equation*} \begin{cases} \omega _{1} = v_{1} m_{i_{1},1}+v_{2} m_{i_{2},1}+\ldots+v_{c} m_{i_{c},1}\\ \omega _{2} = v_{1} m_{i_{1},2}+v_{2} m_{i_{2},2}+\ldots+v_{c} m_{i_{c},2}\\ \qquad \qquad \qquad ~ ~ \vdots \\ \omega _{L} = v_{1} m_{i_{1},L}+v_{2} m_{i_{2},L}+\ldots+v_{c} m_{i_{c},L} \end{cases}\tag{2}\end{equation*}$ View Source Output vector $(\tau _{1},\ldots ,\tau _{L})$ and $(\omega _{1},\ldots ,\omega _{L})$ .
${VerifProof}(\rho ,chal,sk_{l})\rightarrow \{1,0\}$ Let $sk_{l} = (a_{l},b_{l},c_{l})$ and $chal = (c, k_{1}, k_{2})$ . For $1 \leq z \leq c$ : 1. Compute the index of each sampled block: $i_{z} = {\pi }_{k_{1}}(z)$ . 2. Compute the relevant coefficient: $v_{z} = f_{k_{2}}(z)$ . Parse the proof to obtain $\tau$ and $\omega$ . Compute: $\tau = \tau _{1} + a_{l} \tau _{2} +\ldots+ a_{l}^{L-1}\tau _{L}$ , $\omega = \omega _{1} + c_{l} \omega _{2} +\ldots+ c_{l}^{L-1}\omega _{L}$ and $\sigma = v_{1}\,\,f_{b_{l}}(i_{1})+\ldots+v_{c} f_{b_{l}}(i_{c})$ . If $\tau =\sigma +\omega$ , then output 1. Otherwise output 0.
${VerifCOM}(\{m_{}\},\{i\},c,\{COM_{i}\}) \rightarrow \{1,0\}$ : Let $mpk = (N, d, g, h)$ . For each $i \in \{i_{1},\ldots ,i_{c}\}$ given in advance, the related commitment functions and file chunks can be revealed to public as following:
- $COM_{i}$ , $d$ , $m_{}$ where $COM_{i} = COM(m_{}) = {(g^{m_{}}h^{H(i)})}^{e} \bmod N$ .

Following this, the validity of arbitrary chunks and its commitment can be verified.

For $i \in \{i_{1},\ldots ,i_{c}\}$ , the verifier computes: 1. $s_{i} = g^{m_{}}h^{H(i)} \bmod N$ . 2. $t_{i} = (COM_{})^{d} = (g^{m_{}}h^{H(i)})^{e d} \bmod N = g^{m_{}}h^{H(i)} \bmod N$ . If $s_{i} = t_{i}$ , the chunk $m_{}$ and its commitment are matching.

Output 1, if and only if all pairs of chunks and commitments match completely. Otherwise, return 0.

C. The DR-GPOS Protocol

A DR-GPOS protocol is designed in four phases:

1) Pre-Process

The TTP runs ${KeyGen}$ and ${PreGen}$ , and sends $F$ , $COM$ and $\Sigma$ to the server $S$ for storage. Then TTP distributes the secret key $K_{i}$ to each group member. Following this, $F$ , $COM$ and $\Sigma$ are removed from local storage.

2) Challenge and Verification

Every member $U_{i}$ can independently generate a challenge $chal$ and sends it to $S$ . Then $S$ executes ${ProofGen}$ and sends the proof $\rho$ to $U_{i}$ . Finally, $U_{i}$ checks the validity of $\rho$ by executing ${VerifProof}$ .

3) Malicious-Member Distinction

When multiple members have different decisions, $S$ is required to open the commitment functions to the public, hence, everyone can distinguish the dishonest member.

4) Malicious-Member Revocation

When a dishonest member is distinguished, TTP can revoke it, which means TTP removes the dishonest member from the group and informs all members that verification of the malicious one is invalid. Notice that the key of revoked member cannot be used to derive keys of other members or forge a valid tag and proof. The final three phases can be executed multiple times respectively to insure that both the server and group members are honest.

SECTION V.

Security Analysis

In this section, we will introduce the security model and security proof of DR-GPOS scheme.

A. Security Model

The security model is given by three games, which capture properties of member distinction, member revocation, and proof of storage. For simplicity, we utilize $T_{i}$ and $m_{i}$ to replace $T_{}$ and $m_{}$ respectively.

Game 1 Member Distinction Game:

Setup: The adversary is provided with the security parameter $1^\lambda$ as input, and it outputs a pair of file chunks $m_{0}, m_{1}$ of the same length.
Challenge: The challenger runs ${PreGen}(msk, m)$ and outputs $b_{0}, b_{1}$ where $COM(m_{0}) = b_{0}$ and $COM(m_{1}) = b_{1}$ .
Choose a random bit $r \leftarrow \{ 1, 0 \}$ , then give $b_{r}$ and $mpk$ to the adversary.
Decision: The adversary runs ${VerifCOM}(mpk, b_{r})$ and outputs 1 if $b_{r}$ is the correct commitment of $m_{1}$ , otherwise outputs 0.

The commitment function is valid (i.e. satisfy the property of binding) if and only if $\Pr [PrivK_{A}^{COM} (n,{b_{0}}) = 1] \leq negl(n)$ and $|\Pr [PrivK_{A}^{COM}(n,{b_{1}}) = 1] - 1| \leq negl(n)$ .

The probability of guessing the correct choice by is $\begin{align*} \Pr [PrivK_{A}^{COM}(n,{b_{r}}) = r]=&\frac {1}{2} \cdot \Pr [PrivK_{A}^{COM}(n,{b_{1}}) = 1] \\&+ \frac {1}{2} \cdot \Pr [PrivK_{A}^{COM} (n,{b_{0}}) \!=\! 0].\end{align*}$ View Source

Remarks:

In Game 1 (which is proposed by Wang et al. [7]), we define the adversary’s advantage as the absolute value of the probability of guessing the right choice minus 1/2. If the advantage of adversary is negligible, then the scheme is secure, which means: $\begin{align*}&\hspace{-1.3pc}\mid \frac {1}{2} \cdot \Pr [PrivK_{A}^{COM}(n,{b_{1}}) = 1] \\&\qquad + \frac {1}{2} \cdot \Pr [PrivK_{A}^{COM} (n,{b_{0}}) = 0]- \frac {1}{2}|\leq negl(n).\end{align*}$ View Source Game 2 Member Revocation Game:

Setup: The challenger performs ${KeyGen}(1^\lambda)$ to generate a set of keys $\{K_{l}\}_{l=1}^{L}$ , and keeps them secret.
Selective Revocation: The adversary queries keys adaptively, and the challenger sends these queried keys to the adversary and keeps other keys secret. Note that the number of queried keys must be less than $L$ . After that, the adversary owns partial keys $\{K_{l_{1}},\ldots ,K_{l_{r}}\}$ which are the keys of revoked members, while the challenger preserves residual keys $\{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\ldots ,K_{l_{r}}\}$ privately.
Forge: The adversary can obtain files $File = (m_{1},m_{2},\ldots ,m_{n})$ , tags $\Sigma = (T_{1},T_{2},\ldots ,T_{n})$ and some keys $\{K_{l_{1}},\ldots ,K_{l_{r}}\}$ . Then the adversary runs PreGen and ProofGen to generate a tag $T_{r}'$ and proof $\rho$ . Note that $\rho$ comprises of $T_{r}'$ , where $T_{r}' \neq T_{r}$ , and both of them are tags of chunk $m_{r}$ .

${VerifProof}(K',chal,\rho) \rightarrow 1$

, where

$K' \in \{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\,\,\ldots ,K_{l_{r}}\}$

, then the adversary wins the game. The probability that adversary wins the game is negligible if the scheme is against selective and collusion attacks. The probability must satisfy:

$\begin{equation*} \textrm {Pr} \!\left [{ \begin{array}{cc}\begin{smallmatrix} \! \!{VerifProof}(K',chal,\rho) \rightarrow 1 \!\\ \! \! K' \in \{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\ldots,K_{l_{r}}\} \!\\ \! \! {ProofGen}(File,T'_{r})\rightarrow \rho \!\\ \end{smallmatrix}\!&\! \begin{smallmatrix} \! {PreGen}(\{K_{l_{1}},\ldots,K_{l_{r}}\},m_{r})\rightarrow T'_{r} \!\! \\ \! {PreGen}(\{K_{l}\}_{l=1}^{L},m_{r})\rightarrow T_{r} \!\!\\ \! T'_{r}\! \neq T_{r} \!\!\\ \end{smallmatrix} \end{array} }\right] \!\leq negl(\lambda).\end{equation*}$

View Source

Game 3 Storage Proof Game:

Setup: The challenger runs ${KeyGen}(1^\lambda)$ to generate a set of keys $\{K_{l}\}_{l=1}^{L}$ , and keeps them secret.
Query: The adversary can query adaptively. Select chunk $m_{i}$ of a file and provide it to the challenger. It then executes ${PreGen}(\{K_{l}\}_{l=1}^{L}, m_{i})$ , and computes $T_{i}$ which is then provided to the adversary. By executing these queries multiple times, all these chunks $F = (m_{1},\ldots , m_{n})$ can be stored by the adversary, together with tags $\Sigma = (T_{1},\ldots , T_{n})$ .
Challenge: The challenger generates a challenge $chal$ and requests a proof of chunks $m_{i_{1}},\ldots , m_{i_{c}}$ , where $1 \leq i_{c} \leq n \wedge 1 \leq c \leq n$ .
Forge: The adversary computes a proof $\rho$ determined by $chal$ and returns $\rho$ .

If ${VerifProof}(K', chal, \rho)\rightarrow 1$ , where $K' \in \{K_{l}\}_{l=1}^{L}$ , then the adversary wins the game.

The probability that the adversary wins the game is negligibly close to the probability that the adversary extracts those challenged file chunks. Intuitively, a verification tag $T$ and proof $\rho$ cannot be computed unless the adversary already possesses all the challenged chunks.

B. Security Proof

The security of our scheme is proven based on provable security theory. The modern security proofs take the reductionist approach rely on an assumption about the hardness of some mathematical problem in order to prove their security. Therefore, in this paper, we reduce breaking the proposed scheme to solving an underlying hard problem. DR-GPOS scheme is based on the security assumptions of DLP [12] and secure PRF [13].

Theorem 1:

If $f$ is a secure PRF and TTP is trusted, then under the DLP assumption, the DR-GPOS scheme can accurately distinguish a malicious member, guarantee secure revocation of the malicious member, and data possession in standard model.

Proof.

1) Correctness

For $1 \leq l \leq L$ and $(a_{l},b_{l},c_{l}) \leftarrow {KeyGen}(1^\lambda)$ , the probability of failure is given as: $\begin{align*} fail_{GPOS}(\lambda):=&\textrm {Pr}[{VerifProof}(K_{l}, chal, {ProofGen} \\&\times (\{m_{}\}_{1 \leq i \leq n}, \{T_{}\}_{1 \leq i \leq n}, chal)) \neq 1].\end{align*}$ View Source

Since $f$ is a secure PRF, then the calculation of tags can be expressed as $AT=X$ , where $A$ is the matrix of $a$ , $T$ is the matrix of Tag, and $X$ is a random matrix. By the assumption of L-XAC, we can obtain: $fail_{GPOS}(\lambda) \leq \frac {L(L-1)}{2q}$ .

2) Soundness

For $1 \leq l \leq L$ and arbitrary invalid proof $\rho ' \notin \{{ProofGen}(File, \Sigma , chal)\}$ , the probability that $\rho '$ passes the verification is given as: $\begin{align*} Adv_{GPOS}^{inval}(\lambda):=&\max \limits _{l} \textrm {Pr}[{VerifProof}(K_{l},chal,\rho ') \\=&1 | K_{l} \leftarrow {KeyGen}(1^\lambda)].\end{align*}$ View Source As the tags which are used to generate a proof are random, hence we can conclude: $Adv_{GPOS}^{inval}(\lambda) \leq \frac {1}{q^{c}}$ , where $c$ is the number of challenged chunks.

3) Distinction of Malicious-Member

We utilize the reduction method to prove the property of distinction, which is realized by using Game 1 of DR-GPOS. In order to to distinguish a malicious-member, it needs to be established that the verifier cannot lie and repudiate, i.e. the verification result must be bound to the original data and cannot be altered.

An adversary $A$ can be constructed, if there exists an adversary $A^{*}$ wins the Distinction Game, to break the Pedersen commitment function. Game 1 can be executed as following:

Setup: $A^{*}$ is given input security parameter $1^\lambda$ , and it outputs a pair of data chunks $m_{0}, m_{1}$ of the same length. $A^{*}$ then gives $m_{0}$ and $m_{1}$ to $A$ .
Challenge: $A$ runs ${PreGen}(msk, m_{i})$ and outputs $b_{0}, b_{1}$ where $COM(m_{0}) = b_{0}$ and $COM(m_{1}) = b_{1}$ . Then $A$ chooses a random bit $r \leftarrow \{ 1, 0 \}$ , and gives $b_{r}$ and $mpk$ to $A^{*}$ .
Forge: $A^{*}$ executes ${VerifCOM}(mpk, b_{r})$ and outputs 1 or 0.

If the probability satisfies: $\begin{equation*} |\Pr [PrivK_{A}^{COM}(n,{b_{0}}) = 1] - 1| \leq negl(n)\end{equation*}$ View Source or $\begin{equation*} |\Pr [PrivK_{A}^{COM}(n,{b_{1}}) = 0] - 1| \leq negl(n),\end{equation*}$ View Source then the adversary wins. This means that, $A^{*}$ can forge data $m_{0}$ that can commit another data $m_{1}$ , for the reason that $m_{0}$ and $m_{1}$ are given by $A^{*}$ in advance. By Pedersen commitment function, $A$ can compute commitment which satisfies $COM(m_{0}, ID_{0}) = COM(m_{1}, ID_{1})$ where $m_{0}, m_{1} \in Z_{q}$ and $m_{0} \neq m_{1}$ . Obviously, $ID_{0} \neq ID_{1} \bmod N$ and $\log _{g}h = \frac {m_{0} - m_{1}} {H(ID_{0}) - H(ID_{1})} \bmod N$ . By the assumption of DLP, $\log _{g} h \bmod N$ cannot be found except with negligible probability in $|N|$ . Thus, it fulfills the property of non-repudiation, i.e. a malicious member can be distinguished if it lies or repudiates a valid decision.

4) Revocation of Malicious-Member

Intuitively, secure revocation means that the operation of revocation does not affect the normal execution of system, and the revoked member cannot derive keys of others or forge a valid tag and proof by utilizing his secret key. Run Game 2 as follows:

Setup: The challenger runs ${KeyGen}(1^\lambda)$ to generate a set of secret keys $\{K_{l}\}_{l=1}^{L}$ , and keeps them private.
Selective Revocation: After the adversary queries keys adaptively, the adversary owns partial keys $\{K_{l_{1}},\ldots ,K_{l_{r}}\}$ , which are the keys of revoked members, while the challenger preserves residual keys $\{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\ldots ,K_{l_{r}}\}$ privately.
Forge: The adversary uses the known information: $File = (m_{1},m_{2},\ldots ,m_{n})$ , $\Sigma = (T_{1},T_{2},\ldots ,T_{n})$ and $\{K_{l_{1}},\ldots ,K_{l_{r}}\}$ to run PreGen and ProofGen and generate a tag $T_{r}'$ and a proof $\rho$ . Notice that $\rho$ is consisted of $T_{r}'$ , where $T_{r}' \neq T_{r}$ , and both them are tags of chunk $m_{r}$ .

By securing against substitution attacks of ${L}$ -XAC, the probability of secure revocation $Adv_{GPOS}^{rev}(\lambda)$ is $\begin{align*}&\hspace {-2pc}{ \max \limits _{File, \Sigma } \textrm {Pr} \left [{ \begin{array}{cc} \begin{smallmatrix} \!\!\! {VerifProof}(K',chal,\rho) = 1 \!\\ \!\!\! K' \leftarrow \{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\ldots ,K_{l_{r}}\} \!\\ \!\!\! \rho \leftarrow {ProofGen}(File,T'_{r}) \!\\ \end{smallmatrix}\!&\! \begin{smallmatrix} \! T'_{r} \leftarrow {PreGen}(\{K_{l_{1}},\ldots ,K_{l_{r}}\},m_{r}) \!\!\!\\ \! T_{r} \leftarrow {PreGen}(\{K_{l}\}_{l=1}^{L},m_{r})\!\!\!\\ \! T'_{r} \neq T_{r} \!\!\!\\ \end{smallmatrix} \end{array} }\right] }\\&\qquad \qquad \qquad \qquad = Adv_{XAC}^{sub}(\lambda)\leq 2\cdot \frac {L-1}{q}.\end{align*}$ View Source The above probability is negligible because $q = 2^\lambda$ .

5) Proof of Storage

The parameters have been simplified for the proof as: set all $v_{z}$ to 1, and the matrix is represented by capital letters. Then we utilize PRF $f(x)$ and a random value $r$ to run Game 3 in real environment and ideal environment respectively.

a: In Real Environment

Setup: The challenger runs ${KeyGen}(1^\lambda)$ to generate a set of keys $\{K_{l}\}_{l=1}^{L}$ , and keeps them secret.
Query: The adversary can query adaptively. A file chunk $m_{i}$ is selected and provided to the challenger, which then executes ${PreGen}(\{K_{l}\}_{l=1}^{L}, m_{i})$ to compute the tag $T_{i}$ as follows: compute $T = A^{-1}CM + A^{-1}B$ , where $M$ is the matrix of file blocks, $A$ and $C$ are the matrices of secret keys, and $B = [f_{b_{1}}(i),f_{b_{2}}(i),\ldots ,f_{b_{L}}(i)]^{T}$ . Note that $T_{i} = T^{T}$ where the vector $T_{i}$ is the transpose of matrix $T$ . The challenger then sends $T_{i}$ to the adversary. This query process can be executed arbitrary times, after which, the adversary owns and stores all the file data $File = (m_{1},\ldots ,m_{n})$ and the corresponding tags $\Sigma = (T_{1},\ldots ,T_{n})$ .
Challenge: The challenger generates a challenge $chal$ and requests a proof of chunks $m_{i_{1}},\ldots , m_{i_{c}}$ , where $1 \leq i_{c} \leq n \wedge 1 \leq c \leq n$ from the adversary.
Forge: The adversary computes a proof $\rho$ determined by $chal$ and returns $\rho$ to the challenger.

b: In Ideal Environment

We utilize the random number to replace the result of the pseudo random function, hence $B = [r_{1},r_{2},..,r_{L}]^{T}$ . All the random numbers need be recorded, and will be used in verification. In an ideal environment, Game 3 is executed similar to real environments.

If the adversary passes the verification process when $M$ was changed or forged, then that the adversary can successfully forge a $T'$ such that $T' = A^{-1}CM' + A^{-1}B$ , where $M'$ is the changed $M$ in the ideal environment. Due to the fact that $A$ and $C$ are matrixes of random secret keys and $B$ is the random matrix, the inequation can be represented as follows: $A^{ideal}_{GPOS} = \textrm {Pr}[T'|T' = A^{-1}CM' + A^{-1}B] = \textrm {Pr}[T'|T' = R_{1}M' + R_{2}] \leq \frac {1}{2^{\lambda \cdot L}}$ , where $R_{1}$ and $R_{2}$ are matrixes of random numbers. Based on the assumption: $f$ is a secure PRF, therefore the adversary cannot distinguish whether the protocol runs in the ideal environment or the real environment. As a consequence, the probability that the adversary forge a valid tag is $A_{GPOS}^{real} \cong A_{GPOS}^{ideal} \leq \frac {1}{2^{\lambda \cdot L}}$ .

SECTION VI.

Implementation and Evaluation

The evaluation has been done using Baidu Cloud Servers, where all experimental data is stored. The server is configured with 16GB memory and 4-core processors with multi-threading support. Algorithms are implemented using OpenSSL version 0.9.8b with a modulus $N$ of size 1024 bits on Red Hat Enterprise Linux AS release 4. Disk I/O performance is measured with Samsung 840 Pro (MZ-7PD128BW) 120GB Solid state Disk. In order to minimize the error margin, all experimental results are obtained by repeated testing and comparison.

In DR-GPOS scheme, the sampling method used is uniform with the classic S-PDP scheme [4]. From the inequation: $\begin{equation*} 1 - {\left ({{\frac {n - t}{n}} }\right)^{c}} \le {P_{X}} \le 1 - {\left ({{\frac {{n {-} c + 1 - t}}{n - c + 1}} }\right)^{c}},\end{equation*}$ View Source it is easy to see that the data integrity can be detected with a high probability by asking proof of constant number of file chunks. Denote $t = 1$ % of $n$ , and $P_{X}$ is at least 99%, then the number of challenged blocks is 460.

In order to standardise the performance of experiments, the size of file data chunk/block is 256KB in the absence of special instructions.

A. The Comparison of Various PDP and POR Schemes

The comparison of various schemes is shown in Table 1. We can see from the table that DR-GPOS is the only one that can accurately distinguish a malicious member from a private verification group, as well as support secure revocation of malicious-members and data deduplication.

TABLE 1 The Comparison of the Performance of PDP and POR Schemes

Note that, when S-PDP and CPOR are applied to groups, each group member runs S-PDP or CPOR scheme by using their secret keys. Therefore, each member is independent and the selective opening attacks (i.e. a malicious member colludes with the server.) are avoided. Hence, the system can support secure revocation of malicious members.

B. The Efficiency Evaluation of DR-GPOS

Fig. 2 and Fig. 3 show the performance of DR-GPOS with different number of group members. To improve understanding, they are presented in 3D and 2D format respectively.

FIGURE 2.

The efficiency evaluation of pre-process of DR-GPOS with different member number. (a) The performance in 3D surface (b) The performance in 2D line.

Show All

FIGURE 3.

The efficiency evaluation of challenge and verification with varying member size. (a) The performance in 3D surface (b) The performance in 2D line.

Show All

In the pre-process phase, TTP utilizes all the secret keys $\{K_{l}\}_{l=1}^{L}$ and master secret key $msk$ to preprocess the file data, and generates verification tags and commitment functions. Then TTP sends the file data, tags and commitments to the server, distributes the secret keys to each member respectively through a secure channel, and finally deletes all the local storage except the master secret key. In experiments, we measure the computing cost of pre-process, which includes the time of generating keys, tags and commitments, as well as the time of necessary I/O, but without the time of data transmission.

We can see from Fig. 2 that the computing time is linearly increased with the increase of the file size. For fixed sized file chunks, the total file size is greater, hence individual file chunks are bigger, which leads to larger computation costs. Moreover, with the increase in members, the computing time rises. Moreover, with the increase in members, the computing time rises. This is due to the fact that generation of tags requires secret keys of all members. Hence, as the members grow, the computation costs grow with more keys. Though the computing cost is fairly large when the total file size or the number of group member is large, the computation of pre-process is a one-time effort, and the results can be used repeatedly. Thus, the experimental results are acceptable.

In the phase of challenge and verification, each group member $U_{k}$ can start a challenge $chal$ to the server, and the server generates a proof $\rho$ based on $chal$ and sends it to $U_{k}$ , finally, $U_{k}$ utilizes its secret key $K_{k}$ to verify $\rho$ . In the process of evaluation, we measure the computation time of proof generation, proof verification, and necessary I/O time. Analogously, we ignore the time of transmission and the I/O time of data seeking and comparison.

As Fig. 3 shows, for the fixed file chunks size and group member number, the time of challenge and verification is constant, irrespective of the file size. Thus, the number of challenged chunks is fixed at 460 in experiments. Therefore, the computing time of generating and verifying one proof is not dependent on the total file size. However, with the increase in members, the number of file blocks (each file chunk includes $L$ blocks.) rises, accordingly, the computing cost increases. We can observe from the figure, when the number of group member is relatively small (i.e. under 20), the time of one challenge and verification remains under 0.5 seconds. When the number of members is relatively large (i.e. about 128), the time is about 3.7 seconds. This is reasonably efficient in a real word application.

C. The Efficiency Evaluation of Distinction

The efficiency of distinction is directly relevant to the computational efficiency of commitment functions. Therefore, we measure the performance of commitment functions in this section.

In DR-GPOS scheme, each chunk corresponds to a commitment. In the phase of pre-processing, TTP utilizes $msk$ to run PreGen and compute commitment, and sends them to the server. In the phase of public commitment verification, anyone can use the $mpk$ to run VerifCOM and verify the commitment, and accordingly everyone can determine who is dishonest. In experiments, we only measure the time of commitment generation and verification, and ignore the time of information transmission.

Fig. 4 shows the performance of generating and verifying commitments. As we can observe, the generation time is linearly related to the total file size. This is due to the fact that the number of file chunks increases with the increase in file size and fixed chunk size, therefore the time of generation is linearly increased. On the other hand, for a file with the same size, with the increase in chunk size, the number of file chunks is decreased linearly, and consequently the time of generation is reduced. Though the generation time is growing linearly related to the file size, it is a one-time calculation and reusable.

FIGURE 4.

The performance of commitment generation and verification.

Show All

In the phase of commitment verification, the number of challenged chunks can be given by the group members. In experiments, we adopt the worst-case performance, which means we measure the verification time of 460 chunks and commitments each time.

As we can see from Fig. 4, the time of verification is constant with the fixed chunk size. In this phase, not all verifiers have $\phi (N)$ , and thus these verifiers have to exponentiate the whole data chunks which can be quite time consuming. With the increase in chunk size, the calculation cost is unavoidably incremental, and therefore the time cost also rises. In the meantime, for the same chunk size, the time of each verification is constant, because the number of chunks for each verification is fixed at 460. Thus, it has no relation with the total file size. Though the time cost of verification is more consuming than private verification, the commitment verification is infrequent, even if malicious members exist during each authentication phase, the computational time is acceptable.

On the whole, it is efficient and practical enough for distinguishing the dishonest group members, so that appropriate parameters can be chosen.

D. The Comparison With Group NRPDP and GPDP

The performances of pre-process of DR-GPOS, group NRPDP and GPDP [8] are shown in Fig. 5. Similar to the previous analysis, when NRPDP is applied in group, each group member running NRPDP scheme use its own secret key.

FIGURE 5.

The comparison between group NRPDP and GPDP.

Show All

It can be observed, for fixed total file size and chunk/block size (256KB), NRPDP is more time-consuming than DR-GPOS and GPDP in pre-processing phase. The reason is in NRPDP scheme, each tag generation needs an exponent arithmetic numerous multiplication steps, while in DR-GPOS and GPDP scheme, multiplication and addition operations are only needed to generate tags. However, the computational cost of DR-GPOS is slightly higher than GPDP because of the added computational cost of commitment functions. Even so, the increased computational cost is infinitesimally small than the original computation of pre-process.

On the other hand, the time is linearly related to the total file size in all three schemes. Analogously, the number of file chunks/blocks with fixed size is increasing with the growth of file size, therefore the time of generation linearly increases. Additionally, with the increase in the members, the computing time of all three rises. Even then, the DR-GPOS and GPDP are more efficient than NRPDP.

As for the verification time, the three private verification schemes have minor differences, while the efficiency of the three is similar.

E. The Comparison Between Private Verification and Public Verification

The benefit of public verification is the exposure of dishonest members. However, the public schemes tend to be more time-consuming. Fig. 6 shows a generalized comparison of the public verification using bilinear pairing and private verification of DR-GPOS.

FIGURE 6.

The comparison between public and private verification.

Show All

In experiments, we utilize the the Pairing-Based Cryptography (PBC) library version 0.5.14, and use the modulus $N$ of size 1024 bits.

It can be observed from Fig. 6 that the ratio of one bilinear pairing against one multiplication operation is approximately 350, regardless of file size. Moreover, the ratio against one exponent operation is approximately 9. Hence, the bilinear operations tend to be inefficient. For the verification performance, the number of challenged chunks is set to 460 in both public and private schemes. We only observe the verification on time. The ratio of onetime public verification against private is approximately 2.5. Although it will not affect much in single verification, but in real world applications, it will be done thousands of times, which will make it significant. We can conclude that public schemes using bilinear pairing are inefficient in real world systems.

F. The Storage and Communication Cost

1) Storage Cost

The storage cost of DR-GPOS is only relevant to the total file size and security parameter, and is irrespective of the number of group members. For instance, each file chunk of 4KB corresponds with both a verification tag and a commitment of 1024 bits; this means the total storage in cloud increases by 6.25% (equal split of tags and commitments). It is acceptable because we can reduce the extra storage by enlarging the chunk size. Though it requires more extra storage than GPDP, it has more advantages compared to the most other schemes. The storage cost on each group member is $O(1)$ , which is only related to the security parameters.

2) Communication Cost

In the phase of challenge and verification, the required bandwidth between client and server is $O(1)$ , this is because the challenge and the proof are all constant (the $chal$ is 384B, and the $\rho$ is $256 \times L\text{B}$ ). In the phase of commitment verification, the server requires a larger bandwidth (each commitment is 1024 bits, and each chunk is 256KB) to reveal the original data and related commitments. However, these circumstances rarely appear in normal cloud systems. The availability of high bandwidth also makes it acceptable in practical systems.

SECTION VII.

Related Work

In order to guarantee data integrity in cloud storage, many approaches and schemes have been proposed, e.g. [2], [17], [18]. Each type of schemes has its own advantages and disadvantages, and can solve different security problems. According to the definition in S-PDP scheme, only verifiers who hold the private key can verify the data integrity in a private scheme, while others cannot. Comparatively, the verification keys are public in a public verification scheme, and anyone can use these keys to verify the data. Generally speaking, private protocols are more efficient, but the verification results are only known by the verifiers. Meanwhile, the public schemes are usually inefficient because of the usage of bilinear pairing, but everyone can identify the verification results. We categorized the different types of schemes and described the properties the proposed scheme can satisfy.

A. Private Verification Schemes

Although private verification schemes are highly efficient, the data deduplication, malicious-member distinction and revocation cannot be guaranteed when the private schemes are applied in group application.

1) Schemes Based on RSA

Ateniese et al. [4], [19] proposed the first sampling model for PDP without requiring the server retrieval and accessing the entire file. In their schemes, the server provides probabilistic proof with different levels of PDP guarantees. They utilized exponent structure to construct the homomorphic verification tags and use RSA scheme to keep the tags private. After generating a proof, they verified it with the secret key of RSA. The schemes are all based on RSA cryptography, which has the drawback of exponential calculations.

The RSA method has the property of homomorphism, and can be used to construct the detection mechanism of data integrity. The simplified algorithm is as follows:

Pre-process:

Choose two large prime $p$ and $q$ , and compute $N=pq$ . Generate key pair: $pk=(N,g)$ and $sk=(e,d)$ .

Let $m_{i}$ be the file block, and compute tag: $T_{i}=(g^{m_{i}})^{d}\bmod N$ where $g$ is the generator of $QR_{N}$ .

Send $m_{i}$ and $T_{i}$ to the server.

Challenge and Verification:

The client gives a challenge to the server.

The server chooses file blocks $m_{i_{k}}(1\leq k\leq c)$ and tags $T_{i_{k}}$ based on the chal-lenge.

The server compute $T=\prod _{k=1}^{c}T_{i_{k}}$ and $\rho =g^{\sum _{k=1}^{c} m_{i_{k}}}\mod N$ , and sends them to the client.

The client computes $\tau =T^{e}$ and verify whether $\tau =\rho$ .

This method is regarded as one of the notable landmarks in this filed, and many subsequent schemes such as [7], [20] utilize the RSA method.

2) Schemes Based on Sentinels

Juels and Kaliski [5], [21] introduced the notion of proof of retrievability (POR), which is function-similar to PDP. The POR scheme uses sentinels hidden among regular file blocks to detect modified data, so that it can only be applied to encrypted files and only perform a limited number of queries, which equals to the number of sentinels.

The simplified process is as follows:

Pre-process:

The client encodes the file with error correction code, and then inserts the sentinels into the encoded file.

Record the sentinels and send the file to the server.

Challenge and Verification:

The client gives the challenge which includes positions of sentinels to the server.

The server returns the sentinels of the corresponding positions based on the challenge.

The client comparesthe sentinels with local records.

This approach can only perform a limited number of detections, because of the finite number of sentinels. Therefore, majority of later POR schemes have abandoned this approach.

3) Schemes Based on Symmetric Cryptography

Shacham and Waters [14] proposed a new notion of Compact POR (CPOR), which utilizes symmetric cryptography and homomorphic properties to combine multiple authenticator values into a small one and minimizes the communication cost. It utilizes exponential calculation for verification tags so that it increases computation cost. Dodis et al. [22] formally proved the security of a variant of scheme proposed by Juels and Kaliski, and built the first unbounded-use POR scheme which doesn’t rely on RO (Random Oracle) and the first bounded-use scheme with information-theoretic security. It is a theoretical scheme and has not been implemented.

B. Public Auditing Schemes

Public verification scheme can easily distinguish a malicious member. However, the public verification schemes are generally constructed based on the bilinear pairing or third-party (TP) verification, which makes the computational efficiency relatively low.

1) Schemes Based on Bilinear Pairing

This approach is generally used for public verification which can achieve zero-storage on client. The drawback is inefficient computation of bilinear pairing and no privacy because anyone can obtain and verify the data of other clients.

Hanser and Slamanig [23] proposed the first simultaneous private and public verification PDP scheme based on bilinear pairing and elliptic curve (EC), which uses the same pre-process and metadata to achieve two kinds of verifiability. The drawback is still the extra storage cost and exponential calculation on both server and client. Zhu et al. [24]–[26] presented a cooperative PDP (CPDP) scheme based on bilinear pairing, homomorphic verifiable response and hash index hierarchy to support scalability of service and data migration in hybrid cloud. The drawback is that the operation of bilinear pairing is very time-consuming, thus these schemes of public verification are always not very efficient.

2) Schemes Based on TP

TP is utilized to represent the cloud client to verify the possession of data. It supports the public verification and usually applies on encrypted data. Wang et al. presented public auditing schemes [27]–[30] based on TP which usually require extra cost of client side.

In 2014, a new notion of Outsourced Proofs of Retrievability (OPOR) [31] was proposed. The OPOR scheme, which is named Fortress, utilizes an external party which is untrusted to conduct a POR scheme and interact with the server on behalf of the client. Though the Fortress scheme can protect all these three parties synchronously, the drawbacks are obvious. On one hand, the Fortress conducts two POR scheme in parallel, so that all the computation costs and communication costs are double. On the other hand, once the verification is unacceptable, the client has to inspect both the auditor and server, which is inefficient.

C. Group Auditing Schemes

In some special scenarios, different clients in a group may share a same file, and the system needs more than one client to verify this file data.

Wang et al. [16], [32] proposed the privacy-preserving schemes with private and public auditing respectively. In their mechanism, the identity of data signer can be kept private to the auditors. Soon afterwards, Wang et al. [33] introduced a new public auditing scheme which can efficiently revoke a user, and the revoked user can not utilize his secret key to forge a valid proof. However, the shortcoming is taking no account of collusion. Wang et al. [6] proposed a notion of Group-oriented Proofs of Storage (GPOS), which can limit the communication bandwidth in a fixed size. But all these group schemes do not involve the issue of deduplication. To solve this problem, Wang et al. [8] proposed a group PDP (GPDP) scheme which can efficiently guarantee data possession with deduplication, as well as against selective opening attacks of a malicious party. Nevertheless, how to accurately and efficiently distinguish and revoke a malicious member in a private verification group is still a unresolved problem.

SECTION VIII.

Conclusion

In this paper, we give a DR-GPOS scheme, which is based on on matrix calculation, pseudo-random function, and commitment function. When malicious members are involved in a group and repudiate a valid proof, DR-GPOS can efficiently distinguish the malicious one and securely revoke its access.

In terms of functionality, DR-GPOS can accurately distinguish the dishonest members in a group, as well as guarantee the integrity and deduplication of the outsourced data. From a security perspective, DR-GPOS can support the revocation of a dishonest member and resist the attacks from revoked members (e.g. forge proof by colluding with server).

We give the security analysis in the standard model, and have implemented the scheme or realtime cloud servers to evaluate the performance. Evaluation shows that DR-GPOS not only efficient than other schemes, but also more practical for application of real world.

References is not available for this document.

Efficient Group Proof of Storage With Malicious-Member Distinction and Revocation

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Overview of Our Idea

Case 1:

Case 2:

Preliminaries

A. Homomorphic Macs

Definition 1:

B. Cross-Authentication Codes

1) Definition of Cross-Authentication Codes

Definition 2:

2) An Example of L-XAC

C. A Reusable Commitment Function

1) Information-Theoretically Hiding

2) Computational Binding

Remarks:

D. Security Assumptions

1) DLP (Discrete Logarithm Problem)

2) PRF (Pseudo-Random Function)

The Proposed DR-GPOS Scheme

A. Definition of Dr-GPOS

Definition 3:

B. Detailed Construction

C. The DR-GPOS Protocol

1) Pre-Process

2) Challenge and Verification

3) Malicious-Member Distinction

4) Malicious-Member Revocation

Security Analysis

A. Security Model

Remarks:

B. Security Proof

Theorem 1:

Proof.

1) Correctness

2) Soundness

3) Distinction of Malicious-Member

4) Revocation of Malicious-Member

5) Proof of Storage

a: In Real Environment

b: In Ideal Environment

Implementation and Evaluation

A. The Comparison of Various PDP and POR Schemes

B. The Efficiency Evaluation of DR-GPOS

C. The Efficiency Evaluation of Distinction

D. The Comparison With Group NRPDP and GPDP

E. The Comparison Between Private Verification and Public Verification

F. The Storage and Communication Cost

1) Storage Cost

2) Communication Cost

Related Work

A. Private Verification Schemes

1) Schemes Based on RSA

2) Schemes Based on Sentinels

3) Schemes Based on Symmetric Cryptography

B. Public Auditing Schemes

1) Schemes Based on Bilinear Pairing

2) Schemes Based on TP

C. Group Auditing Schemes

Conclusion

References