Introduction
Cloud computing [1]–[3] in recent years has provided a low-cost and scalable services for clients and has been widely used in industrial and academic communities. The most fundamental and popular service is the cloud storage, which has been widely used, and provided by many vendors (e.g. Google Drive, Amazon S3, Dropbox, etc.).
However, there still exist a number of security issues, which lead to data loss in cloud storage. For instance, T-Mobile1 suffered phone sales hit by sidekick loss. Afterwards, Google Gmail2 has encountered large-scale data losses in 2011. Hence, these accidental or malicious breaches incur huge losses for the clients as well as the enterprises. Therefore, data integrity of cloud storage verification has become an important research issue.
In order to solve aforementioned challenge, a series of proof of storage (POS) schemes [4]–[7] have been proposed. In some actual applications, multiple users jointly manage and verify the same data. For instance, data of some internet companies (e.g. purchase records and browsing history) is exploding every day. Once the data is damaged, the company suffers serious losses. To solve this problem, some professional verifiers are required to check and preserve the data regularly. These verifiers can be considered as members of a group who manage and share the same data. The group manager concludes if the data is intact when being possessed based on the group members’ responses.
If malicious members exist in the group, the malicious members should be identified and revoked. In a real world, malicious members may also collude with the cloud server to forge valid proofs. Therefore, if different verification results are returned from group members, the dishonest one should be revoked.
Public verification schemes can avoid this trouble, since anyone can verify the data integrity. However, public schemes cannot be efficiently executed since they are generally constructed based on the bilinear pairing or third-party verification.
Therefore, an efficient POS system needs to be established to meet the following requirements:
Achieve data integrity in group applications: The scheme should prove the data possession efficiently and accurately in a group environment. Each group member can verify the proof independently.
Distinguish malicious members from honest ones: If different verification results are returned, the scheme can exactly distinguish the dishonest member from honest members.
Revoke malicious members: When a malicious member is identified, the scheme can revoke its access, which means that the secret key of a revoked member cannot be used to forge a valid tag or proof, even if the malicious member colludes with the server. Additionally, the revoked member cannot derive any secret key of other members.
Resist selective opening attacks: In the group, multiple members with different keys correspond to the same verification tag. For security reasons, if the secret keys of some members are leaked, the adversary (possibly a cloud server or a malicious third party) cannot use the information to forge a legitimate tag or proof.
Support data deduplication: For existing work, each member holds its own metadata (e.g. verification tags). If all the members with different secret keys share the same metadata (i.e. deduplication is achieved), the storage costs will be reduced. That is, if deduplication is captured, the metadata of the system should be only related to the total data size, and has no relation with the number of members.
Wang et al. proposed a GPDP scheme [8] to guarantee the properties of data deduplication, data integrity and resisting selective attacks, i.e. the above requirements 1, 4 and 5. In order to achieve all of the above properties, we will present a new DR-GPOS scheme which is based on matrix calculation, pseudo-random functions, and commitment functions. The four-fold concrete contributions of this paper are summarized as follows:
In terms of functionality, our proposed scheme can accurately distinguish and securely revoke the malicious member, as well as guarantee the integrity and deduplication of the outsourced data.
From a security perspective, our proposed scheme can resist against the selective attacks and the collusion attacks from revoked members (e.g. forging proof by colluding with server).
To verify its security, we define three security games to capture properties of member distinction, member revocation, and proof of storage respectively, and then give the formal security analysis in the standard model.
In order to evaluate the performance of the scheme, we implement a prototype of DR-GPOS scheme on Baidu cloud server, and utilize data size of 10G to measure the efficiency of DR-GPOS.
The rest of this paper is organized as follows: We give the overview of our idea in section 2, and the preliminaries in section 3. Then in section 4, the definition and concrete construction of DR-GPOS are introduced in detail. We further give the security analysis and performance evaluation in section 5 and section 6, respectively. Finally, the related work is introduced in section 7 and the conclusion is given in section 8.
Overview of Our Idea
DR-GPOS scheme consists of four phases, as shown in Fig.1: a) preprocessing, b) challenge and verification, c) malicious-member distinction, and d) malicious-member access revocation.
The DR-GPOS scheme. (a) The pre-process (b) The challenge and verification (c) The malicious-member distinction (d) The malicious-member revocation.
In the pre-process phase, the trusted third party (TTP) decomposes the file and generates the verification tags (Tags), commitment functions (COMs), and the processed data (File). Afterwards, TTP forwards the data to the server, distributes the secret keys to each group member, and deletes all local file data.
In the challenge and verification phase, each member sends an independent challenge
The third phase has two sub-cases:
Case 1:
If all members accept their corresponding proofs returned by the server, then the server can guarantee the possession of the group’s data at given time.
Case 2:
In the scenario, where some members do not accept the returned proofs, then it can conclude that either some members are being dishonest, or the server has been compromised. In this situation, the server is required to open the commitment functions to prove its innocence and distinguish the dishonest members. The commitment functions can be verified by anyone, and the dishonest entities can be sought out publicly.
In the phase of malicious-member revocation, the dishonest members should be revoked. Note that the revoked members cannot use their keys to forge legal proofs and pass the verification as legitimate members, even if they collude with the server. In addition, the revoked keys cannot be used to derive any other valid secret keys.
Preliminaries
In this section, we introduce homomorphic Macs, cross-authentication codes, commitment function, and the security assumptions, which serve as a basis of the proposed scheme.
A. Homomorphic Macs
The definition of Homomorphic Message Authentication Code (HomMac) was initially proposed by Agrawal and Boneh [9]. We utilize boldfaced
Definition 1:
A HomMac consists of four polynomial-time algorithms (Gen, Sign, Combine, Verify) as follows:
is a probabilistic algorithm, which creates the public-private key pair{Gen}(1^{\lambda })\rightarrow (pk,sk) , based on the security parameter input(pk,sk) .\lambda is an algorithm to generate a tag of each vector. It takes secret key{Sign}(sk, id, \boldsymbol {v}_{i}, i) \rightarrow t_{i} , the vector space identifiersk , vectorid (v_{i} \in F_{q}^{s} is the length of vector), and its indexs as input, and returns the tagi oft_{i} .\boldsymbol {v}_{i} is an algorithm to generate a homomorphic Mac. It takes public key{Combine}((\boldsymbol {v}_{1}, t_{1}, \alpha _{1}),\ldots,(\boldsymbol {v}_{c}, t_{c}, \alpha _{c}),pk)\rightarrow (T, y) ,pk vectorsc , the related tags\boldsymbol {v}_{1},\ldots ,\boldsymbol {v}_{c} \in F_{q}^{s} , andt_{1},\ldots t_{c} random constantsc as input. Then it computes\alpha _{1},\ldots ,\alpha _{c} \in F_{q} and the corresponding tag\boldsymbol {y} = \sum _{i=1}^{c} {\alpha _{i} \boldsymbol {v}_{i}} \in F_{q}^{s} , and returnsT andT .\boldsymbol {y} is a deterministic algorithm to verify a homomorphic Mac. It takes the key pair{Verif}(pk, sk, id, \boldsymbol {y}, T) \rightarrow \{1, 0\} , the vector space identifier(pk, sk) , vectorid , and the corresponding\boldsymbol {y}\in F_{q}^{s} as input. If the verification is accepted, 1 is returned, otherwise 0.T
B. Cross-Authentication Codes
1) Definition of Cross-Authentication Codes
The definition of Cross-Authentication Codes was given by Fehr et al. [10].
Definition 2:
(
The returned value of {0, 1} shows if the input
is a probabilistic algorithm to generate a uniform random key. Based on the security parameter{XGen}(1^{k}) \rightarrow K , the secret keyk is returned.K is an algorithm to generate the tags for verification. A set of keys{XAuth}(K_{1},K_{2},\ldots ,K_{L})\rightarrow T are given as input, and the corresponding tagK_{1},K_{2},\ldots ,K_{L} is returned.T \in \textrm {XK} is an algorithm to verify a tag. The returned value of {0, 1} shows if the input{XVer}(K_{i},T) \rightarrow \{1,0\} is the correct tag (generated byT ), for one of the secret keyschal .K_{i}
The following properties are required:
Correctness:
For all
, the probabilityi \in [L] is negligible, wherefail_{XAC}(k) \begin{align*}&\hspace {-2pc}fail_{XAC}(k) \\:=&\textrm {Pr}[{XVer}(K_{i}, i, {XAuth}(K_{1},K_{2},\ldots ,K_{L})) \neq 1].\end{align*} View Source\begin{align*}&\hspace {-2pc}fail_{XAC}(k) \\:=&\textrm {Pr}[{XVer}(K_{i}, i, {XAuth}(K_{1},K_{2},\ldots ,K_{L})) \neq 1].\end{align*}
Security Against Impersonation Attacks:
Intuitively, when the key
is not the generation key of the tagK , even if the whole tag space is accessed, it is difficult to find aT' which can be verified byT' . The probability ofK is defined as:Adv_{XAC}^{imp}(k) where the max is applied over all\begin{align*} Adv_{XAC}^{imp}(k):=&\max \limits _{i,T'} \textrm {Pr}[{XVer}(K,i,T') \\=&1 | K \leftarrow {XGen}(1^{k})]\leq negl(k)\end{align*} View Source\begin{align*} Adv_{XAC}^{imp}(k):=&\max \limits _{i,T'} \textrm {Pr}[{XVer}(K,i,T') \\=&1 | K \leftarrow {XGen}(1^{k})]\leq negl(k)\end{align*}
andi \in [L] .T' \in \textrm {XT} Security Against Substitution Attacks:
Intuitively, it is easy to verify the correctness of the tag
with the keyT . But in the absence ofK_{i} , even if the adversary has a correct tagK_{i} and all other keysT , it is difficult to forge a legitimate labelK_{\neq i} which can be verified byT' . The probability ofK_{i} is defined asAdv_{XAC}^{sub}(k) Notice that the max is applied over all\begin{equation*} \max \limits _{i,K_{\neq i}, F} \textrm {Pr} \!\left [{ \!\begin{array}{cc} \begin{smallmatrix} T' \neq T \\ \wedge \\ {XVer}(K,i,T')=1 \\ \end{smallmatrix}& \begin{smallmatrix} K_{i} \leftarrow {XGen}(1^{k}) \\ T := {XAuth}(K_{1},K_{2},\ldots ,K_{L})\\ T'\leftarrow F(T)\\ \end{smallmatrix} \end{array} \!\!}\right]\!\leq negl(k).\end{equation*} View Source\begin{equation*} \max \limits _{i,K_{\neq i}, F} \textrm {Pr} \!\left [{ \!\begin{array}{cc} \begin{smallmatrix} T' \neq T \\ \wedge \\ {XVer}(K,i,T')=1 \\ \end{smallmatrix}& \begin{smallmatrix} K_{i} \leftarrow {XGen}(1^{k}) \\ T := {XAuth}(K_{1},K_{2},\ldots ,K_{L})\\ T'\leftarrow F(T)\\ \end{smallmatrix} \end{array} \!\!}\right]\!\leq negl(k).\end{equation*}
,i \in [L] , and all possible functionsK_{\neq i} = (K_{j})_{j \neq i} \in \textrm {XK} .F: \textrm {XT} \rightarrow \textrm {XT}
2) An Example of L-XAC
Let
XGen: generates a set of random keys
.K_{1} = (a_{1},b_{1}),\ldots ,K_{L} = (a_{L},b_{L}) \in \textrm {XK} XAuth: generates the authentication tag
such thatT = (T_{0},T_{1},\ldots ,T_{L-1}) \in F^{L} forp_{T}(a_{i}) = b_{i} , where satisfyi = 1,2,\ldots ,L . The linear equationp_{T}(x) = T_{0} + T_{1}x +\ldots + T_{L-1}x^{L-1} \in F[x] can be used to efficiently compute the value ofAT=B . It is important to note thatT is a Vandermonde matrix, andA \in F^{L \times L} is the column vectorB \in F^{L} . Moreover, the[b_{1},b_{2},\ldots ,b_{L}] th row is given byi- .1,a_{i},a_{i}^{2},\ldots ,a_{i}^{L-1} XVer: verify the tag
and output 1 if and only ifT andT \neq \perp ,p_{T}(a_{i}) = b_{i} .i \in [L]
C. A Reusable Commitment Function
The non-repudiation property in PDP/POR scheme has been achieved in Wang et al.’s scheme [7] by introducing a commitment scheme, which is reusable and binding. Based on Pedersen commitment function, the reusable commitment function is as follows.
Let \begin{equation*} COM(M) = (g^{M} h^{H(ID)})^{e} \bmod N\end{equation*}
Intuitively, it is a ramification of the Pedersen commitment function [11], therefore it is a perfect commitment scheme which is against an all-powerful receiver. Above reusable commitment function has two properties: information-theoretically hiding and computational binding.
1) Information-Theoretically Hiding
The committer
is a random element ofCOM(M) , for any given a probabilistic polynomial time receiver. Hence, there is no distinction between a random element ofG_{q} andG_{q} .COM(M) it is possible to compute
by an all-powerful receiver. Hence,\log _{g} h \bmod q can be obtained. Note that, it is impossible to determine the value of\log _{g} {COM} \bmod q = M + H(ID) \log _{g} h with a random auxiliary valueM .H(ID)
2) Computational Binding
For a probabilistic polynomial time sender, it is difficult to compute
Remarks:
is the unique identifier ofID , and the hash functionM is used to resist the forgery of multiple coefficient.H does not reveal the information ofd , thus it ensures the commitment cannot be forged, which makes it reusable.e When initiating a challenge, the committer discloses the value of
andM , thus anyone can verify whetherID holds, wheres = t ands = g^{M} h^{H(ID)} \bmod N .t = (COM(M))^{d} = (g^{M} h^{H(ID)})^{ed} \bmod N = g^{M} h^{H(ID)}\,\, \bmod \,\,N
D. Security Assumptions
1) DLP (Discrete Logarithm Problem)
Let
2) PRF (Pseudo-Random Function)
Let
Indexing: Each function has a unique index
associated with it, thus randomly picking a functionk is easy.f \in F_{k} Polynomial time Computation: Given an input
and a functionx , there exists a polynomial time algorithm to computef \in F_{k} .f(x) Pseudo-Randomness: No probabilistic polynomial time algorithms can distinguish the functions in
and from the functions inF_{k} , as well as the value ofH_{k} and a randomness inf(x) .I_{k}
The Proposed DR-GPOS Scheme
The definition and detailed construction of DR-GPOS is given in the following section.
A. Definition of Dr-GPOS
Definition 3:
GPOS with Malicious-Member Distinction and Revocation (DR-GPOS). A DR-GPOS scheme is a collection of five polynomial-time algorithms (
is a probabilistic polynomial-time algorithm to generate keys. The input security parameter{KeyGen}(1^{\lambda }) \rightarrow (mK, K_{l}) is used to generate the master key\lambda and the secret keymK of each member.K_{l} is a polynomial-time algorithm which can generate the metadata required to verify the proof. It uses the private keys{PreGen}(\{K\}_{l=1}^{L}, mK, m_{i}) \rightarrow (T_{i}, COM_{i}) , the master key\{K\}_{l=1}^{L} and a file blockmK as input, and returns the commitment functionm_{i} and the verification tagsCOM_{i} .T_{i} is a polynomial-time algorithm to generate a proof of storage. The input consists of an ordered data collection{ProofGen}(File,chal,\Sigma) \rightarrow \rho , a challengeFile and an ordered collectionchal of tags. It returns a proof\Sigma for the file data determined by\rho .chal is a polynomial-time algorithm to verify a proof of storage. It takes secret key{VerifProof}(K_{l}, chal, \rho) \rightarrow \{1,0\} of arbitrary member, a challengeK_{l} and a proofchal as input, and returns whether\rho is a correct proof for the data determined by\rho .chal is a polynomial-time algorithm to distinguish the dishonest member from honest members. It takes the chosen chunks{VerifCOM}(\{M\},\{I\},c,\{COM\}) \rightarrow \{1,0\} , the corresponding indices\{M\} and commitment functions\{I\} as input, and returns whether each commitment is valid.\{COM\}
B. Detailed Construction
The algorithm fist divides the file into
Let
Let \begin{align*}&H:\{0,1\}^{\log _{2}{(n)}}\rightarrow \{0,1\}^{\log _{2}{(n)}}.\\&f:\{0,1\}^{\lambda } \times \{0,1\}^{\log _{2}{(n)}} \rightarrow \{0,1\}^{\lambda }.\\&\pi :\{0,1\}^{\lambda } \times \{0,1\}^{\log _{2}{(n)}} \rightarrow \{0,1\}^{\log _{2}{(n)}}.\end{align*}
: Let{KeyGen}(1^\lambda)\rightarrow (msk,mpk,sk_{l}) ande be secret primes such thatd . Leted = 1 \bmod p'q' be the maximum number of group members. Given the security parameterL , the outputs are\lambda ,msk = (p,q,e) and a set of secret keys:mpk = (d,N,g,h) .sk_{l}=(a_{l},b_{l},c_{l})\in \kappa , 1\leq l \leq L : For each{PreGen}(\{sk_{l}\}_{l=1}^{L}, msk, m_{< i>}) \rightarrow (T_{< i>}, COM_{i}) ,i , compute the verification tag of each chunk1 \leq i \leq n :m_{< i>} The tag\begin{align*}&\hspace {-2.pc}\begin{bmatrix} 1,c_{1},c_{1}^{2},\ldots ,c_{1}^{L-1}\\ 1,c_{2},a_{2}^{2},\ldots ,c_{2}^{L-1}\\ \vdots \\ 1,c_{L},c_{L}^{2},\ldots ,c_{L}^{L-1} \end{bmatrix}\cdot \begin{bmatrix} m_{i,1}\\ m _{i,2}\\ \vdots \\ m _{i,L} \end{bmatrix}+ \begin{bmatrix} f_{b_{1}}(i)\\ f _{b_{2}}(i)\\ \vdots \\ f _{b_{L}}(i) \end{bmatrix}\\&\qquad \qquad = \begin{bmatrix} 1,a_{1},a_{1}^{2},\ldots ,a_{1}^{L-1}\\ 1,a_{2},a_{2}^{2},\ldots ,a_{2}^{L-1}\\ \vdots \\ 1,a_{L},a_{L}^{2},\ldots ,a_{L}^{L-1} \end{bmatrix}\cdot \begin{bmatrix} T_{i,1}\\ T _{i,2}\\ \vdots \\ T _{i,L}. \end{bmatrix}\end{align*} View Source\begin{align*}&\hspace {-2.pc}\begin{bmatrix} 1,c_{1},c_{1}^{2},\ldots ,c_{1}^{L-1}\\ 1,c_{2},a_{2}^{2},\ldots ,c_{2}^{L-1}\\ \vdots \\ 1,c_{L},c_{L}^{2},\ldots ,c_{L}^{L-1} \end{bmatrix}\cdot \begin{bmatrix} m_{i,1}\\ m _{i,2}\\ \vdots \\ m _{i,L} \end{bmatrix}+ \begin{bmatrix} f_{b_{1}}(i)\\ f _{b_{2}}(i)\\ \vdots \\ f _{b_{L}}(i) \end{bmatrix}\\&\qquad \qquad = \begin{bmatrix} 1,a_{1},a_{1}^{2},\ldots ,a_{1}^{L-1}\\ 1,a_{2},a_{2}^{2},\ldots ,a_{2}^{L-1}\\ \vdots \\ 1,a_{L},a_{L}^{2},\ldots ,a_{L}^{L-1} \end{bmatrix}\cdot \begin{bmatrix} T_{i,1}\\ T _{i,2}\\ \vdots \\ T _{i,L}. \end{bmatrix}\end{align*}
can be computed by solving linear matrix equation group withT_{< i>}=(T_{i,1}, T_{i,2}, \ldots , T_{i,L}) unknowns. For eachL ,i , compute the commitment function for each chunk1 \leq i \leq n :m_{< i>} . OutputCOM_{i} = COM(m_{< i>}) = (g^{m_{< i>}}h^{H(i)})^{e} \bmod N .\{m_{< i>}, T_{< i>}, COM_{i}\}_{1\leq i \leq n} The challenge is{ProofGen}(\{m_{< i>}\}_{1\leq i \leq n},\{T_{< i>}\}_{1\leq i \leq n},chal) \rightarrow \rho , wherechal = (c, k_{1},k_{2}) is the number of challenged chunks andc are fresh random keys. Fork_{1}, k_{2} . Compute the index of each sampled block:1 \leq z \leq c :~1 . 2. Compute the relevant coefficient:i_{z} = {\pi }_{k_{1}}(z) . Compute:v_{z} = f_{k_{2}}(z) and\begin{equation*} \begin{cases} \tau _{1} = v_{1} T_{i_{1},1}+v_{2} T_{i_{2},1}+\ldots+v_{c} T_{i_{c},1}\\ \tau _{2} = v_{1} T_{i_{1},2}+v_{2} T_{i_{2},2}+\ldots+v_{c} T_{i_{c},2}\\ \qquad \qquad \qquad ~ ~ \vdots \\ \tau _{L} = v_{1} T_{i_{1},L}+v_{2} T_{i_{2},L}+\ldots+v_{c} T_{i_{c},L} \end{cases}\tag{1}\end{equation*} View Source\begin{equation*} \begin{cases} \tau _{1} = v_{1} T_{i_{1},1}+v_{2} T_{i_{2},1}+\ldots+v_{c} T_{i_{c},1}\\ \tau _{2} = v_{1} T_{i_{1},2}+v_{2} T_{i_{2},2}+\ldots+v_{c} T_{i_{c},2}\\ \qquad \qquad \qquad ~ ~ \vdots \\ \tau _{L} = v_{1} T_{i_{1},L}+v_{2} T_{i_{2},L}+\ldots+v_{c} T_{i_{c},L} \end{cases}\tag{1}\end{equation*}
Output vector\begin{equation*} \begin{cases} \omega _{1} = v_{1} m_{i_{1},1}+v_{2} m_{i_{2},1}+\ldots+v_{c} m_{i_{c},1}\\ \omega _{2} = v_{1} m_{i_{1},2}+v_{2} m_{i_{2},2}+\ldots+v_{c} m_{i_{c},2}\\ \qquad \qquad \qquad ~ ~ \vdots \\ \omega _{L} = v_{1} m_{i_{1},L}+v_{2} m_{i_{2},L}+\ldots+v_{c} m_{i_{c},L} \end{cases}\tag{2}\end{equation*} View Source\begin{equation*} \begin{cases} \omega _{1} = v_{1} m_{i_{1},1}+v_{2} m_{i_{2},1}+\ldots+v_{c} m_{i_{c},1}\\ \omega _{2} = v_{1} m_{i_{1},2}+v_{2} m_{i_{2},2}+\ldots+v_{c} m_{i_{c},2}\\ \qquad \qquad \qquad ~ ~ \vdots \\ \omega _{L} = v_{1} m_{i_{1},L}+v_{2} m_{i_{2},L}+\ldots+v_{c} m_{i_{c},L} \end{cases}\tag{2}\end{equation*}
and(\tau _{1},\ldots ,\tau _{L}) .(\omega _{1},\ldots ,\omega _{L}) Let{VerifProof}(\rho ,chal,sk_{l})\rightarrow \{1,0\} andsk_{l} = (a_{l},b_{l},c_{l}) . Forchal = (c, k_{1}, k_{2}) : 1. Compute the index of each sampled block:1 \leq z \leq c . 2. Compute the relevant coefficient:i_{z} = {\pi }_{k_{1}}(z) . Parse the proof to obtainv_{z} = f_{k_{2}}(z) and\tau . Compute:\omega ,\tau = \tau _{1} + a_{l} \tau _{2} +\ldots+ a_{l}^{L-1}\tau _{L} and\omega = \omega _{1} + c_{l} \omega _{2} +\ldots+ c_{l}^{L-1}\omega _{L} . If\sigma = v_{1}\,\,f_{b_{l}}(i_{1})+\ldots+v_{c} f_{b_{l}}(i_{c}) , then output 1. Otherwise output 0.\tau =\sigma +\omega : Let{VerifCOM}(\{m_{< i>}\},\{i\},c,\{COM_{i}\}) \rightarrow \{1,0\} . For eachmpk = (N, d, g, h) given in advance, the related commitment functions and file chunks can be revealed to public as following:i \in \{i_{1},\ldots ,i_{c}\} ,COM_{i} ,d wherem_{< i>} .COM_{i} = COM(m_{< i>}) = {(g^{m_{< i>}}h^{H(i)})}^{e} \bmod N
For
Output 1, if and only if all pairs of chunks and commitments match completely. Otherwise, return 0.
C. The DR-GPOS Protocol
A DR-GPOS protocol is designed in four phases:
1) Pre-Process
The TTP runs
2) Challenge and Verification
Every member
3) Malicious-Member Distinction
When multiple members have different decisions,
4) Malicious-Member Revocation
When a dishonest member is distinguished, TTP can revoke it, which means TTP removes the dishonest member from the group and informs all members that verification of the malicious one is invalid. Notice that the key of revoked member cannot be used to derive keys of other members or forge a valid tag and proof. The final three phases can be executed multiple times respectively to insure that both the server and group members are honest.
Security Analysis
In this section, we will introduce the security model and security proof of DR-GPOS scheme.
A. Security Model
The security model is given by three games, which capture properties of member distinction, member revocation, and proof of storage. For simplicity, we utilize
Game 1 Member Distinction Game:
Setup: The adversary is provided with the security parameter
as input, and it outputs a pair of file chunks1^\lambda of the same length.m_{0}, m_{1} Challenge: The challenger runs
and outputs{PreGen}(msk, m) whereb_{0}, b_{1} andCOM(m_{0}) = b_{0} .COM(m_{1}) = b_{1} Choose a random bit
, then giver \leftarrow \{ 1, 0 \} andb_{r} to the adversary.mpk Decision: The adversary runs
and outputs 1 if{VerifCOM}(mpk, b_{r}) is the correct commitment ofb_{r} , otherwise outputs 0.m_{1}
The commitment function is valid (i.e. satisfy the property of binding) if and only if
The probability of guessing the correct choice by is \begin{align*} \Pr [PrivK_{A}^{COM}(n,{b_{r}}) = r]=&\frac {1}{2} \cdot \Pr [PrivK_{A}^{COM}(n,{b_{1}}) = 1] \\&+ \frac {1}{2} \cdot \Pr [PrivK_{A}^{COM} (n,{b_{0}}) \!=\! 0].\end{align*}
Remarks:
In Game 1 (which is proposed by Wang et al. [7]), we define the adversary’s advantage as the absolute value of the probability of guessing the right choice minus 1/2. If the advantage of adversary is negligible, then the scheme is secure, which means:\begin{align*}&\hspace{-1.3pc}\mid \frac {1}{2} \cdot \Pr [PrivK_{A}^{COM}(n,{b_{1}}) = 1] \\&\qquad + \frac {1}{2} \cdot \Pr [PrivK_{A}^{COM} (n,{b_{0}}) = 0]- \frac {1}{2}|\leq negl(n).\end{align*}
Setup: The challenger performs
to generate a set of keys{KeyGen}(1^\lambda) , and keeps them secret.\{K_{l}\}_{l=1}^{L} Selective Revocation: The adversary queries keys adaptively, and the challenger sends these queried keys to the adversary and keeps other keys secret. Note that the number of queried keys must be less than
. After that, the adversary owns partial keysL which are the keys of revoked members, while the challenger preserves residual keys\{K_{l_{1}},\ldots ,K_{l_{r}}\} privately.\{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\ldots ,K_{l_{r}}\} Forge: The adversary can obtain files
, tagsFile = (m_{1},m_{2},\ldots ,m_{n}) and some keys\Sigma = (T_{1},T_{2},\ldots ,T_{n}) . Then the adversary runs PreGen and ProofGen to generate a tag\{K_{l_{1}},\ldots ,K_{l_{r}}\} and proofT_{r}' . Note that\rho comprises of\rho , whereT_{r}' , and both of them are tags of chunkT_{r}' \neq T_{r} .m_{r}

Game 3 Storage Proof Game:
Setup: The challenger runs
to generate a set of keys{KeyGen}(1^\lambda) , and keeps them secret.\{K_{l}\}_{l=1}^{L} Query: The adversary can query adaptively. Select chunk
of a file and provide it to the challenger. It then executesm_{i} , and computes{PreGen}(\{K_{l}\}_{l=1}^{L}, m_{i}) which is then provided to the adversary. By executing these queries multiple times, all these chunksT_{i} can be stored by the adversary, together with tagsF = (m_{1},\ldots , m_{n}) .\Sigma = (T_{1},\ldots , T_{n}) Challenge: The challenger generates a challenge
and requests a proof of chunkschal , wherem_{i_{1}},\ldots , m_{i_{c}} .1 \leq i_{c} \leq n \wedge 1 \leq c \leq n Forge: The adversary computes a proof
determined by\rho and returnschal .\rho
If
The probability that the adversary wins the game is negligibly close to the probability that the adversary extracts those challenged file chunks. Intuitively, a verification tag
B. Security Proof
The security of our scheme is proven based on provable security theory. The modern security proofs take the reductionist approach rely on an assumption about the hardness of some mathematical problem in order to prove their security. Therefore, in this paper, we reduce breaking the proposed scheme to solving an underlying hard problem. DR-GPOS scheme is based on the security assumptions of DLP [12] and secure PRF [13].
Theorem 1:
If
Proof.
1) Correctness
For \begin{align*} fail_{GPOS}(\lambda):=&\textrm {Pr}[{VerifProof}(K_{l}, chal, {ProofGen} \\&\times (\{m_{< i>}\}_{1 \leq i \leq n}, \{T_{< i>}\}_{1 \leq i \leq n}, chal)) \neq 1].\end{align*}
Since
2) Soundness
For \begin{align*} Adv_{GPOS}^{inval}(\lambda):=&\max \limits _{l} \textrm {Pr}[{VerifProof}(K_{l},chal,\rho ') \\=&1 | K_{l} \leftarrow {KeyGen}(1^\lambda)].\end{align*}
3) Distinction of Malicious-Member
We utilize the reduction method to prove the property of distinction, which is realized by using Game 1 of DR-GPOS. In order to to distinguish a malicious-member, it needs to be established that the verifier cannot lie and repudiate, i.e. the verification result must be bound to the original data and cannot be altered.
An adversary
Setup:
is given input security parameterA^{*} , and it outputs a pair of data chunks1^\lambda of the same length.m_{0}, m_{1} then givesA^{*} andm_{0} tom_{1} .A Challenge:
runsA and outputs{PreGen}(msk, m_{i}) whereb_{0}, b_{1} andCOM(m_{0}) = b_{0} . ThenCOM(m_{1}) = b_{1} chooses a random bitA , and givesr \leftarrow \{ 1, 0 \} andb_{r} tompk .A^{*} Forge:
executesA^{*} and outputs 1 or 0.{VerifCOM}(mpk, b_{r})
If the probability satisfies:\begin{equation*} |\Pr [PrivK_{A}^{COM}(n,{b_{0}}) = 1] - 1| \leq negl(n)\end{equation*}
\begin{equation*} |\Pr [PrivK_{A}^{COM}(n,{b_{1}}) = 0] - 1| \leq negl(n),\end{equation*}
4) Revocation of Malicious-Member
Intuitively, secure revocation means that the operation of revocation does not affect the normal execution of system, and the revoked member cannot derive keys of others or forge a valid tag and proof by utilizing his secret key. Run Game 2 as follows:
Setup: The challenger runs
to generate a set of secret keys{KeyGen}(1^\lambda) , and keeps them private.\{K_{l}\}_{l=1}^{L} Selective Revocation: After the adversary queries keys adaptively, the adversary owns partial keys
, which are the keys of revoked members, while the challenger preserves residual keys\{K_{l_{1}},\ldots ,K_{l_{r}}\} privately.\{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\ldots ,K_{l_{r}}\} Forge: The adversary uses the known information:
,File = (m_{1},m_{2},\ldots ,m_{n}) and\Sigma = (T_{1},T_{2},\ldots ,T_{n}) to run PreGen and ProofGen and generate a tag\{K_{l_{1}},\ldots ,K_{l_{r}}\} and a proofT_{r}' . Notice that\rho is consisted of\rho , whereT_{r}' , and both them are tags of chunkT_{r}' \neq T_{r} .m_{r}
By securing against substitution attacks of \begin{align*}&\hspace {-2pc}{ \max \limits _{File, \Sigma } \textrm {Pr} \left [{ \begin{array}{cc} \begin{smallmatrix} \!\!\! {VerifProof}(K',chal,\rho) = 1 \!\\ \!\!\! K' \leftarrow \{K_{l}\}_{l=1}^{L} / \{K_{l_{1}},\ldots ,K_{l_{r}}\} \!\\ \!\!\! \rho \leftarrow {ProofGen}(File,T'_{r}) \!\\ \end{smallmatrix}\!&\! \begin{smallmatrix} \! T'_{r} \leftarrow {PreGen}(\{K_{l_{1}},\ldots ,K_{l_{r}}\},m_{r}) \!\!\!\\ \! T_{r} \leftarrow {PreGen}(\{K_{l}\}_{l=1}^{L},m_{r})\!\!\!\\ \! T'_{r} \neq T_{r} \!\!\!\\ \end{smallmatrix} \end{array} }\right] }\\&\qquad \qquad \qquad \qquad = Adv_{XAC}^{sub}(\lambda)\leq 2\cdot \frac {L-1}{q}.\end{align*}
5) Proof of Storage
The parameters have been simplified for the proof as: set all
a: In Real Environment
Setup: The challenger runs
to generate a set of keys{KeyGen}(1^\lambda) , and keeps them secret.\{K_{l}\}_{l=1}^{L} Query: The adversary can query adaptively. A file chunk
is selected and provided to the challenger, which then executesm_{i} to compute the tag{PreGen}(\{K_{l}\}_{l=1}^{L}, m_{i}) as follows: computeT_{i} , whereT = A^{-1}CM + A^{-1}B is the matrix of file blocks,M andA are the matrices of secret keys, andC . Note thatB = [f_{b_{1}}(i),f_{b_{2}}(i),\ldots ,f_{b_{L}}(i)]^{T} where the vectorT_{i} = T^{T} is the transpose of matrixT_{i} . The challenger then sendsT to the adversary. This query process can be executed arbitrary times, after which, the adversary owns and stores all the file dataT_{i} and the corresponding tagsFile = (m_{1},\ldots ,m_{n}) .\Sigma = (T_{1},\ldots ,T_{n}) Challenge: The challenger generates a challenge
and requests a proof of chunkschal , wherem_{i_{1}},\ldots , m_{i_{c}} from the adversary.1 \leq i_{c} \leq n \wedge 1 \leq c \leq n Forge: The adversary computes a proof
determined by\rho and returnschal to the challenger.\rho
b: In Ideal Environment
We utilize the random number to replace the result of the pseudo random function, hence
If the adversary passes the verification process when
Implementation and Evaluation
The evaluation has been done using Baidu Cloud Servers, where all experimental data is stored. The server is configured with 16GB memory and 4-core processors with multi-threading support. Algorithms are implemented using OpenSSL version 0.9.8b with a modulus
In DR-GPOS scheme, the sampling method used is uniform with the classic S-PDP scheme [4]. From the inequation:\begin{equation*} 1 - {\left ({{\frac {n - t}{n}} }\right)^{c}} \le {P_{X}} \le 1 - {\left ({{\frac {{n {-} c + 1 - t}}{n - c + 1}} }\right)^{c}},\end{equation*}
In order to standardise the performance of experiments, the size of file data chunk/block is 256KB in the absence of special instructions.
A. The Comparison of Various PDP and POR Schemes
The comparison of various schemes is shown in Table 1. We can see from the table that DR-GPOS is the only one that can accurately distinguish a malicious member from a private verification group, as well as support secure revocation of malicious-members and data deduplication.
Note that, when S-PDP and CPOR are applied to groups, each group member runs S-PDP or CPOR scheme by using their secret keys. Therefore, each member is independent and the selective opening attacks (i.e. a malicious member colludes with the server.) are avoided. Hence, the system can support secure revocation of malicious members.
B. The Efficiency Evaluation of DR-GPOS
Fig. 2 and Fig. 3 show the performance of DR-GPOS with different number of group members. To improve understanding, they are presented in 3D and 2D format respectively.
The efficiency evaluation of pre-process of DR-GPOS with different member number. (a) The performance in 3D surface (b) The performance in 2D line.
The efficiency evaluation of challenge and verification with varying member size. (a) The performance in 3D surface (b) The performance in 2D line.
In the pre-process phase, TTP utilizes all the secret keys
We can see from Fig. 2 that the computing time is linearly increased with the increase of the file size. For fixed sized file chunks, the total file size is greater, hence individual file chunks are bigger, which leads to larger computation costs. Moreover, with the increase in members, the computing time rises. Moreover, with the increase in members, the computing time rises. This is due to the fact that generation of tags requires secret keys of all members. Hence, as the members grow, the computation costs grow with more keys. Though the computing cost is fairly large when the total file size or the number of group member is large, the computation of pre-process is a one-time effort, and the results can be used repeatedly. Thus, the experimental results are acceptable.
In the phase of challenge and verification, each group member
As Fig. 3 shows, for the fixed file chunks size and group member number, the time of challenge and verification is constant, irrespective of the file size. Thus, the number of challenged chunks is fixed at 460 in experiments. Therefore, the computing time of generating and verifying one proof is not dependent on the total file size. However, with the increase in members, the number of file blocks (each file chunk includes
C. The Efficiency Evaluation of Distinction
The efficiency of distinction is directly relevant to the computational efficiency of commitment functions. Therefore, we measure the performance of commitment functions in this section.
In DR-GPOS scheme, each chunk corresponds to a commitment. In the phase of pre-processing, TTP utilizes
Fig. 4 shows the performance of generating and verifying commitments. As we can observe, the generation time is linearly related to the total file size. This is due to the fact that the number of file chunks increases with the increase in file size and fixed chunk size, therefore the time of generation is linearly increased. On the other hand, for a file with the same size, with the increase in chunk size, the number of file chunks is decreased linearly, and consequently the time of generation is reduced. Though the generation time is growing linearly related to the file size, it is a one-time calculation and reusable.
In the phase of commitment verification, the number of challenged chunks can be given by the group members. In experiments, we adopt the worst-case performance, which means we measure the verification time of 460 chunks and commitments each time.
As we can see from Fig. 4, the time of verification is constant with the fixed chunk size. In this phase, not all verifiers have
On the whole, it is efficient and practical enough for distinguishing the dishonest group members, so that appropriate parameters can be chosen.
D. The Comparison With Group NRPDP and GPDP
The performances of pre-process of DR-GPOS, group NRPDP and GPDP [8] are shown in Fig. 5. Similar to the previous analysis, when NRPDP is applied in group, each group member running NRPDP scheme use its own secret key.
It can be observed, for fixed total file size and chunk/block size (256KB), NRPDP is more time-consuming than DR-GPOS and GPDP in pre-processing phase. The reason is in NRPDP scheme, each tag generation needs an exponent arithmetic numerous multiplication steps, while in DR-GPOS and GPDP scheme, multiplication and addition operations are only needed to generate tags. However, the computational cost of DR-GPOS is slightly higher than GPDP because of the added computational cost of commitment functions. Even so, the increased computational cost is infinitesimally small than the original computation of pre-process.
On the other hand, the time is linearly related to the total file size in all three schemes. Analogously, the number of file chunks/blocks with fixed size is increasing with the growth of file size, therefore the time of generation linearly increases. Additionally, with the increase in the members, the computing time of all three rises. Even then, the DR-GPOS and GPDP are more efficient than NRPDP.
As for the verification time, the three private verification schemes have minor differences, while the efficiency of the three is similar.
E. The Comparison Between Private Verification and Public Verification
The benefit of public verification is the exposure of dishonest members. However, the public schemes tend to be more time-consuming. Fig. 6 shows a generalized comparison of the public verification using bilinear pairing and private verification of DR-GPOS.
In experiments, we utilize the the Pairing-Based Cryptography (PBC) library version 0.5.14, and use the modulus
It can be observed from Fig. 6 that the ratio of one bilinear pairing against one multiplication operation is approximately 350, regardless of file size. Moreover, the ratio against one exponent operation is approximately 9. Hence, the bilinear operations tend to be inefficient. For the verification performance, the number of challenged chunks is set to 460 in both public and private schemes. We only observe the verification on time. The ratio of onetime public verification against private is approximately 2.5. Although it will not affect much in single verification, but in real world applications, it will be done thousands of times, which will make it significant. We can conclude that public schemes using bilinear pairing are inefficient in real world systems.
F. The Storage and Communication Cost
1) Storage Cost
The storage cost of DR-GPOS is only relevant to the total file size and security parameter, and is irrespective of the number of group members. For instance, each file chunk of 4KB corresponds with both a verification tag and a commitment of 1024 bits; this means the total storage in cloud increases by 6.25% (equal split of tags and commitments). It is acceptable because we can reduce the extra storage by enlarging the chunk size. Though it requires more extra storage than GPDP, it has more advantages compared to the most other schemes. The storage cost on each group member is
2) Communication Cost
In the phase of challenge and verification, the required bandwidth between client and server is
Related Work
In order to guarantee data integrity in cloud storage, many approaches and schemes have been proposed, e.g. [2], [17], [18]. Each type of schemes has its own advantages and disadvantages, and can solve different security problems. According to the definition in S-PDP scheme, only verifiers who hold the private key can verify the data integrity in a private scheme, while others cannot. Comparatively, the verification keys are public in a public verification scheme, and anyone can use these keys to verify the data. Generally speaking, private protocols are more efficient, but the verification results are only known by the verifiers. Meanwhile, the public schemes are usually inefficient because of the usage of bilinear pairing, but everyone can identify the verification results. We categorized the different types of schemes and described the properties the proposed scheme can satisfy.
A. Private Verification Schemes
Although private verification schemes are highly efficient, the data deduplication, malicious-member distinction and revocation cannot be guaranteed when the private schemes are applied in group application.
1) Schemes Based on RSA
Ateniese et al. [4], [19] proposed the first sampling model for PDP without requiring the server retrieval and accessing the entire file. In their schemes, the server provides probabilistic proof with different levels of PDP guarantees. They utilized exponent structure to construct the homomorphic verification tags and use RSA scheme to keep the tags private. After generating a proof, they verified it with the secret key of RSA. The schemes are all based on RSA cryptography, which has the drawback of exponential calculations.
The RSA method has the property of homomorphism, and can be used to construct the detection mechanism of data integrity. The simplified algorithm is as follows:
Pre-process:
Choose two large prime
Let
Send
Challenge and Verification:
The client gives a challenge to the server.
The server chooses file blocks
The server compute
The client computes
This method is regarded as one of the notable landmarks in this filed, and many subsequent schemes such as [7], [20] utilize the RSA method.
2) Schemes Based on Sentinels
Juels and Kaliski [5], [21] introduced the notion of proof of retrievability (POR), which is function-similar to PDP. The POR scheme uses sentinels hidden among regular file blocks to detect modified data, so that it can only be applied to encrypted files and only perform a limited number of queries, which equals to the number of sentinels.
The simplified process is as follows:
Pre-process:
The client encodes the file with error correction code, and then inserts the sentinels into the encoded file.
Record the sentinels and send the file to the server.
Challenge and Verification:
The client gives the challenge which includes positions of sentinels to the server.
The server returns the sentinels of the corresponding positions based on the challenge.
The client comparesthe sentinels with local records.
This approach can only perform a limited number of detections, because of the finite number of sentinels. Therefore, majority of later POR schemes have abandoned this approach.
3) Schemes Based on Symmetric Cryptography
Shacham and Waters [14] proposed a new notion of Compact POR (CPOR), which utilizes symmetric cryptography and homomorphic properties to combine multiple authenticator values into a small one and minimizes the communication cost. It utilizes exponential calculation for verification tags so that it increases computation cost. Dodis et al. [22] formally proved the security of a variant of scheme proposed by Juels and Kaliski, and built the first unbounded-use POR scheme which doesn’t rely on RO (Random Oracle) and the first bounded-use scheme with information-theoretic security. It is a theoretical scheme and has not been implemented.
B. Public Auditing Schemes
Public verification scheme can easily distinguish a malicious member. However, the public verification schemes are generally constructed based on the bilinear pairing or third-party (TP) verification, which makes the computational efficiency relatively low.
1) Schemes Based on Bilinear Pairing
This approach is generally used for public verification which can achieve zero-storage on client. The drawback is inefficient computation of bilinear pairing and no privacy because anyone can obtain and verify the data of other clients.
Hanser and Slamanig [23] proposed the first simultaneous private and public verification PDP scheme based on bilinear pairing and elliptic curve (EC), which uses the same pre-process and metadata to achieve two kinds of verifiability. The drawback is still the extra storage cost and exponential calculation on both server and client. Zhu et al. [24]–[26] presented a cooperative PDP (CPDP) scheme based on bilinear pairing, homomorphic verifiable response and hash index hierarchy to support scalability of service and data migration in hybrid cloud. The drawback is that the operation of bilinear pairing is very time-consuming, thus these schemes of public verification are always not very efficient.
2) Schemes Based on TP
TP is utilized to represent the cloud client to verify the possession of data. It supports the public verification and usually applies on encrypted data. Wang et al. presented public auditing schemes [27]–[30] based on TP which usually require extra cost of client side.
In 2014, a new notion of Outsourced Proofs of Retrievability (OPOR) [31] was proposed. The OPOR scheme, which is named Fortress, utilizes an external party which is untrusted to conduct a POR scheme and interact with the server on behalf of the client. Though the Fortress scheme can protect all these three parties synchronously, the drawbacks are obvious. On one hand, the Fortress conducts two POR scheme in parallel, so that all the computation costs and communication costs are double. On the other hand, once the verification is unacceptable, the client has to inspect both the auditor and server, which is inefficient.
C. Group Auditing Schemes
In some special scenarios, different clients in a group may share a same file, and the system needs more than one client to verify this file data.
Wang et al. [16], [32] proposed the privacy-preserving schemes with private and public auditing respectively. In their mechanism, the identity of data signer can be kept private to the auditors. Soon afterwards, Wang et al. [33] introduced a new public auditing scheme which can efficiently revoke a user, and the revoked user can not utilize his secret key to forge a valid proof. However, the shortcoming is taking no account of collusion. Wang et al. [6] proposed a notion of Group-oriented Proofs of Storage (GPOS), which can limit the communication bandwidth in a fixed size. But all these group schemes do not involve the issue of deduplication. To solve this problem, Wang et al. [8] proposed a group PDP (GPDP) scheme which can efficiently guarantee data possession with deduplication, as well as against selective opening attacks of a malicious party. Nevertheless, how to accurately and efficiently distinguish and revoke a malicious member in a private verification group is still a unresolved problem.
Conclusion
In this paper, we give a DR-GPOS scheme, which is based on on matrix calculation, pseudo-random function, and commitment function. When malicious members are involved in a group and repudiate a valid proof, DR-GPOS can efficiently distinguish the malicious one and securely revoke its access.
In terms of functionality, DR-GPOS can accurately distinguish the dishonest members in a group, as well as guarantee the integrity and deduplication of the outsourced data. From a security perspective, DR-GPOS can support the revocation of a dishonest member and resist the attacks from revoked members (e.g. forge proof by colluding with server).
We give the security analysis in the standard model, and have implemented the scheme or realtime cloud servers to evaluate the performance. Evaluation shows that DR-GPOS not only efficient than other schemes, but also more practical for application of real world.