Introduction
As illustrated in [1], the computations of ANN (approximate nearest neighbor) search on high-dimensional dataset are basic in manly applications. The purpose of ANN is to find the approximate nearest neighbor vectors in a given dataset. And, because of the better performance of ANN on high-dimensional large-scale data than other methods, ANN has been applied in various high-dimensional large-scale fields, including product recommendation [2], image retrieval [3], clustering [4], etc. However, existing algorithms for ANN can be efficient when the data dimensionality is small even if the data dimensionality is large [5], [6]. As the data dimension increases, these exact algorithms may even be less efficient than brute force linear scanning due to dimensionality disaster [7]. Nonetheless, many ANN problems often involve high-dimensional large-scale data, for example, many large multimedia retrieval applications use ANN searches [8]. These applications with high-dimensional large-scale data require efficient processing of ANN searches [8].
Moreover, with the continuous growth of the Internet of Things (IoT) and individual devices, the implementation of ANN on resource-constrained devices has become increasingly common. However, due to the limited computational power of these devices, they cannot perform the training and large-scale ANN search processes independently. Outsourcing the ANN search tasks to cloud servers can alleviate the resource constraints of edge devices, but this raises security and verification concerns. The sensitive data collected by resource-constrained devices must be securely outsourced to the cloud server, and a verification mechanism is required to identify any potential improper behavior from the cloud server. Additionally, to ensure the efficiency of the outsourcing and verification mechanism, they should be designed to avoid complex and time-consuming operations.
In light of these challenges, the motivation behind our research is to develop a secure and efficient outsourcing mechanism, the “outsourced
In the rest of this paper, we present our contributions and the proposed outsourced
The contributions can be summarized below:
First, we propose a secure outsourcing mechanism (outsourced
algorithm) for ANN problems on high-dimensional large-scale data. This mechanism enables resource-constrained devices to complete the ANN task securely and efficiently.LSH Second, to ensure the privacy of the resource-constrained client’s information, we use a lightweight arithmetic method to obscure the client’s information. Meanwhile, in order to verify the improper behavior of cloud servers, we built a lightweight verification mechanism with probability 1.
Third, to analyze the proposed scheme in detail, we analyze the proposed method through detailed mathematical analysis. Meanwhile, to evaluate the efficiency and correctness of proposed scheme, we realize the proposed method with Python 3.7 and CIFAR-10 dataset. The results of the experiment show that our proposed scheme is correct and efficient.
Fourth, to illustrate the application of the proposed scheme, we design an architecture of the application model and demonstrate the potential application fields and the method to apply the proposed scheme.
Related Work
A. Machine Learning and LSH
In this information era, with the development of edge devices and networks, high-dimensional data are exploded on the Internet. With that comes the needs of fast ANN search on high-dimensional data in many fields. However, these high-dimensional data also bring a great challenge to fast ANN search because of the volume and high dimensionality of the data. ANN one of the most popular algorithms on nearest neighbor search because of its great benefit in the improvement of computational speed and storage reduction [9]. And, in multiple ANN search methods,
There is also a lot of research devoted to improving the accuracy and efficiency of
B. Secure Outsourcing Computations
Extensive research of security outsourcing schemes for scientific computing have been studied. In terms of the proposal of various outsourcing methods, Atallah et al. proposed the first generic architecture to outsource scientific computations in 2002 [19]. However, the architecture lacks the verification mechanism of computing results by the cloud servers. Then, in order to solve the large-scale systems of linear equations with the help of a fully malicious cloud server, Chen et al. [20] designed a secure outsourcing method in 2015. They utilized serval special sparse matrixes to obscure the input and output data. Meanwhile, they proved through experiments that the client can use their method to detect the improper behavior of cloud servers with probability 1. In 2018, Salinas et al. proposed a secure outsourcing approach for solving large-scale sparse linear systems of equations, which are the most common and fundamental problems in big data [21]. This method not only focuses on efficiency improvement but also focus on memory I/O complexities. They demonstrated that the memory issue of I/O operations is important in making the outsourcing method practical.
Much research has been done to free resource-constrained devices (UAVs, individual phones, etc.) from complex encryption operations [22], [23], [24], [25], [26]. For example, in order to free IoT devices from solving quadratic congruences, Zhang et al. proposed two practical and secure outsourcing algorithms [27]. The method outsourced the heavy computation to a single cloud server and guarantee the IoT device can identify the improper behavior with probability 1. They also take the Rabin algorithm as an example to show the application of the method in IoT applications. For addressing the problem of using untrusted cryptographic helpers, such as cloud servers, Hohenberger et al. provided a formal security definition for outsourcing computations to an untrusted helper [24]. They also provided a framework to quantify the efficiency and verifiability of an outsourcing method. They take the problem of outsourcing modular exponentiation as an example to show how to securely outsource computations. To alleviate the local burden of secret key updates, Yu et al. proposed a secure outsourced key updates method [28]. To verify the outsourced secrets key from an authorized party, they used a third-party auditor (TPA) to audit the storage and resist the key exposure.
C. Cloud/Edge Computing
Cloud computing brings a new way for resource-constrained devices to operate complex computations [29], [30]. However, due to the requirement for data privacy in cloud computing, how to guarantee the data privacy when conduct cloud computing has become a critical issue. Many reaches have been done to guarantee privacy in cloud computing, including the security in storage [31], [32], [33] and the security in computation [20], [34], [35].
There is a great deal of research work on secure outsourcing architecture design. For example, in 2020, Lin et al. [36] focus on outsourcing the bilinear pairings computation, which is the most time-consuming operation in pairing-based cryptographic protocols, and designed a new outsourcing scheme for block chain devices. In order to solve the quadratic congruence problems which is a widely applied operation in cryptographic protocols on resource-constrained devices, Zhang et al. [27] presented two practical and secure outsourcing algorithms. The proposed algorithms enable the IoT devices to outsource the heavy computations of solving quadratic congruences to a single cloud server and do not leak the privacy of computation. Meanwhile, the proposed scheme can identify any improper behavior of the cloud server with probability 1. In 2016, Ye et al. [37] presented a secure outsourcing algorithm for modular exponentiation based on the new mathematical division under the setting of two non-colluding cloud servers. The privacy of outsourced data can be guaranteed, and efficiency is improved.
There has also been a lot of research on how to securely outsource scientific computations to cloud servers. For solving the modular exponentiation problems on IoT devices, Li et al. [38] first proposed a distributed outsourcing scheme for modular exponentiation in 2020. With this scheme, the user can not only protect the privacy of users in the outsourcing process but also identify the invalid results from edge nodes with a high probability. In order to solve the modular inversion problems on resource-constrained clients, Tian et al. [39] presented a new unimodular matrix transformation technique in 2022. This is the first secure outsource computation method that supports arbitrary and variable moduli. This method can verify the correctness of the cloud server results with 1 as the optimal probability in one round iteration.
Preliminaries
In this section, we give a brief introduction to the
A. Local Sensitive Hashing
The basic design idea of
If there exists
If
The basic principle of designing
For a domain
Definition 1:
For any constant
when{\mathrm {Pr}}_{\mathcal {H}} [h\left ({u }\right)=h(v)]\ge p_{1} ; anddict (u,v)\le r when{\mathrm {Pr}}_{\mathcal {H}} [h\left ({u }\right)=h(v)]\le p_{2} .dict \left ({u,v }\right)>cr
In terms of measuring the item distance in \begin{equation*} h_{a,b}\left ({o }\right)=\left \lfloor{ \frac {a^{T}o+b}{w} }\right \rfloor \tag{1}\end{equation*}
In eq. (1),
For example, when Euclidean distance \begin{equation*}p\left ({s }\right)=\Pr \left [{ h_{a,b}\left ({u }\right)=h_{a,b}\left ({v }\right) }\right]=\int _{0}^{w} {\frac {2}{s}\varphi \left({\frac {t}{s}}\right)\left({1-\frac {t}{s}}\right)dt},\end{equation*}
To make (R, c)-NN search more efficient,
B. Secure Outsourcing Computation
One purpose of secure outsourced computing is to enable resource-constrained devices to compute complex computing problems with the help of cloud servers. However, the privacy of the client cannot be leaked to the cloud server in this process. Currently, there is a lot of related research in this area [20], [34], [35]. Here, we use the proposed properties of most security outsourcing needs to have by Gao et al. [42], these properties are:
Security: This property requires that when computing tasks are outsourced to the cloud service, the cloud server cannot obtain any private information from the input/output data.
Verifiability: To protect the correctness of the computing results of the cloud server, the client or a third-party authority needs to check the correctness of the outsourced computing results from the cloud server.
Efficiency: Because the purpose of outsourced computing is to allow resource-constrained clients to perform complex computational tasks, the cost of outsourced tasks should not be too complex, at least less than the complexity of performing the computations themselves.
Proposed Scheme
In this section, we first propose an application model for the outsourced computation of
A. Application Model
The proposed application model of outsourced
B. Concept and Basic Principle
The basic principle of our scheme is to design a system that enables resource-constrained devices, such as individual phones, edge devices, and UAVs, to perform fast ANN search. To achieve this goal, we plan to utilize a secure outsourced computing mechanism to complete the computation task of
C. Detailed Scheme
We express our detailed algorithm in Algorithm 1. In the proposed algorithm, we first generate the projection matrix
Algorithm 1 The Proposed Algorithm
projection vectors that conform to p-stable distributions.
Verification result
t =
The client chooses a random real number
and\alpha elementary rotation matrices:2l as the secret keyP_{1},P_{2},\cdots P_{l},Q_{1},Q_{2},\cdots Q_{l} .SK \begin{equation*} \mathrm {SK}=\{\alpha,P_{1},P_{2},\cdots P_{l},Q_{1},Q_{2},\cdots Q_{l}\} \tag{3}\end{equation*} View Source\begin{equation*} \mathrm {SK}=\{\alpha,P_{1},P_{2},\cdots P_{l},Q_{1},Q_{2},\cdots Q_{l}\} \tag{3}\end{equation*}
Client computes
as:X^{\prime } \begin{align*} X^{\prime }&=P_{1}P_{2}\cdots P_{l}(\alpha X)Q_{1}Q_{2}\cdots Q_{l} \tag{4}\\ W^{\prime }&=(\alpha W)Q_{1}Q_{2}\cdots Q_{l} \tag{5}\end{align*} View Source\begin{align*} X^{\prime }&=P_{1}P_{2}\cdots P_{l}(\alpha X)Q_{1}Q_{2}\cdots Q_{l} \tag{4}\\ W^{\prime }&=(\alpha W)Q_{1}Q_{2}\cdots Q_{l} \tag{5}\end{align*}
Cloud chooses t projection vectors that conform to p-stable distributions to form the projection matrix
.W Computes
as follows:{\mathrm {HashTable}}^{\prime} \begin{equation*} \text {HashTable}^{\prime }=W^{\prime }X^{\prime ^{T}},{W} \in \mathbb {R}^{t\times d} \tag{6}\end{equation*} View Source\begin{equation*} \text {HashTable}^{\prime }=W^{\prime }X^{\prime ^{T}},{W} \in \mathbb {R}^{t\times d} \tag{6}\end{equation*}
The client randomly generates a vector
and calculates the localr\in \mathbb {R}^{1\times t} as:V_{1} \begin{equation*} V_{1}=(rW^{\prime }{)X^{\prime }}^{T} \tag{7}\end{equation*} View Source\begin{equation*} V_{1}=(rW^{\prime }{)X^{\prime }}^{T} \tag{7}\end{equation*}
The client calculates the
by r andV_{2} , as shown in eq. (8). Then, the client checks if{\mathrm {HashTable}}^{\prime} equal to V2. If they are equal, then acceptV_{1} . Otherwise, rejects{\mathrm {HashTable}}^{\prime} .{\mathrm {HashTable}}^{\prime} \begin{equation*} V_{2}=r\text {HashTable}^{\prime } \tag{8}\end{equation*} View Source\begin{equation*} V_{2}=r\text {HashTable}^{\prime } \tag{8}\end{equation*}
\begin{align*} &\text {HashTable} \\ &=\left \lfloor{ \frac {\frac {1}{\alpha ^{2}}(\text {HashTable}^{\prime }P_{1}P_{2}\cdots P_{l})+b}{\text {num}\_{}{\text {hash}}} }\right \rfloor \tag{9}\end{align*}
The elementary rotation matrix (also known as Givens matrix) is chosen as the way of obscuring the data because the elementary rotation matrix is a sparse matrix with most elements being 0, so the computational burden is less than that of the original matrix multiplication, as shown in eq. (2). The specific construction of the elementary rotation matrix is as follows:
Let
andc=cos\theta .s=sin\theta Set the
-th andi -th main diagonal elements asj .c Set the other main diagonal elements as 1.
Set the
-th row andi -th column elements asj .s Set the
-th row andj -th column elements asi .-s Set the other elements as 0.
\begin{align*} T\left ({i,j }\right)=\left [{ {\begin{array}{cccccccccccccccccccc} {\begin{array}{cccccccccccccccccccc} 1 & & \\ & \ddots & \\ & & 1\\ \end{array}} & {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}}\\ {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} c & & \\ & 1 & \\ \vdots & & \ddots \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} s & & \\ & & \\ & & \\ \end{array}}\\ {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} -s & & \\ & & \\ \vdots & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} 1 & & \\ & c & \\ & & {\begin{array}{cccccccccccccccccccc} 1 & & \\ & \ddots & \\ & & 1\\ \end{array}}\\ \end{array}}\\ \end{array}} }\right] \tag{2}\end{align*} View Source\begin{align*} T\left ({i,j }\right)=\left [{ {\begin{array}{cccccccccccccccccccc} {\begin{array}{cccccccccccccccccccc} 1 & & \\ & \ddots & \\ & & 1\\ \end{array}} & {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}}\\ {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} c & & \\ & 1 & \\ \vdots & & \ddots \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} s & & \\ & & \\ & & \\ \end{array}}\\ {\begin{array}{cccccccccccccccccccc} & & \\ & & \\ & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} -s & & \\ & & \\ \vdots & & \\ \end{array}} & {\begin{array}{cccccccccccccccccccc} 1 & & \\ & c & \\ & & {\begin{array}{cccccccccccccccccccc} 1 & & \\ & \ddots & \\ & & 1\\ \end{array}}\\ \end{array}}\\ \end{array}} }\right] \tag{2}\end{align*}
: A multiplication operation multiplies theMultiplication -th row (resp. column) of a matrix by a non-zero scalar.i : A permutation operation makes the two rows (resp. columns) of a matrix swap their location.Permutation : An addition operation makes a matrix addAddition -th row (resp. column) multiplied by a non-zero scalar to thej -th row (resp. column).i
In the actual operation of the proposed scheme, first of all, the client will use
Theoretical Performance Analysis
A. Correctness Analysis
The generated secret key in the client is \begin{align*} \mathrm {HashTable}&=\left \lfloor{ \frac {\frac {1}{\mathrm {\alpha }^{2}}(\mathrm {HashTable}^{\prime} P)\mathrm {+b}}{\mathrm {num\_{}hash}} }\right \rfloor \\ &=\left \lfloor{ \frac {\frac {1}{\alpha ^{2}}\left ({W^{\prime} X^{\prime T}P }\right)\mathrm {+b}}{\mathrm {num\_{}hash}} }\right \rfloor \\ &=\left \lfloor{ \frac {\frac {1}{\alpha ^{2}}\left ({\left ({\alpha W }\right)Q\left ({P\left ({\alpha X }\right)Q }\right)^{T}P }\right)+b}{\mathrm {num\_{}hash}} }\right \rfloor \\ &=\left \lfloor{ \frac {\frac {1}{\alpha ^{2}}\left ({\left ({\alpha W }\right)Q{Q^{T}\left ({\alpha X }\right)^{T}P}^{T}P }\right)+b}{\mathrm {num\_{}hash}} }\right \rfloor \\ &=\left \lfloor{ \frac {\frac {1}{\alpha ^{2}}\left ({\alpha ^{2}WX^{T} }\right)+b}{\mathrm {num\_{}hash}} }\right \rfloor \\ &=\left \lfloor{ \frac {WX^{T}+b}{\mathrm {num\_{}hash}} }\right \rfloor\end{align*}
\begin{align*} V_{1}&=\mathrm {(r}W^{\prime} {)X^{\prime }}^{T} \\ &=\mathrm {r}W^{\prime} X^{\prime T} \\ &=\mathrm {r}{\mathrm {HashTable}}^{\prime} \\ &=V_{2}\end{align*}
B. Security and Verification Analysis
We analyze the security and verification performance of the proposed outsourced
1) Data Privacy
The proposed outsourced
Proof:
To protect the input data of the local client, that is,
2) Verification Performance
The correctness of results from the cloud server can be verified with 100% probability.
Proof:
Based on the verifiability requirement of outsourcing computation, as shown in section III (b), an outsourcing scheme needs to verify the computing results from the cloud server. Therefore, in the proposed scheme, assuming that the cloud server has improper behavior, the client needs verification mechanism to check the occurrence of improper behavior and reject the incorrect calculation results of the cloud server. In the proposed scheme, we use the verification mechanism of checking whether V1 is equal to V2 to verify the correctness of the results. If the cloud service wants to forge the result
C. Efficiency Analysis
We analyze the efficiency of the proposed scheme through detailed theoretical analysis in terms of computation efficiency, and communication consumption.
1) Computation Efficiency
The proposed algorithm is
Proof:
We assume the scalar multiplication operation is SM and the fewer computational computations are neglected in the analysis. When one hash table is generated, in the step ProbGen, the client performs
2) Communication Consumption
We show the communication consumption in the proposed scheme, as shown in Table 2. Among 5 phases, only ComDis phase and Compute phase have communication consumption. Because the KeyGen, Verify, and Recover phases are all carried out on the client side, these three phases have no communication consumption. Therefore, the total communication consumption includes two parts. The first one is communication consumption in the ComDis phase, the client needs to send the obscured matrix
Experimental Performance Evaluation
In this section, we verify the correctness and efficiency of the proposed scheme through experiments. The experimental results show our proposed scheme is efficient and correct.
A. Evaluation Methodology
In our experiments, we used the computer with Windows server 2012, Intel Xeon Bronze 3106 CPU at 1.7 GHz, and 512GB memory as the testbed. Actually, we initially deployed the original
In terms of the efficiency metric, we tested the time consumption of different phases, time consumption on the cloud server side and the client side, and the time consumption of outsourced
We use the CIFAR-10 benchmark dataset as the experimental dataset, as shown in Table 3. The characteristics of the dataset are summarized as follows:
is an image dataset that contains 60,000 color images of 10 classes. Each class contains 6,000 images (the size of each image isCIFAR-10 ).32\times 32
In the experiment, we generate randomly 1000 images (10 categories, 100 images per category) as the query dataset and 1000 to 9500 images (10 categories, 100 to 950 images per category) as the
B. Evaluation Demonstration
In Fig. 2, we briefly describe the experimental process of the proposed outsourced
C. Evaluation Results
We analyze the efficiency of the proposed scheme by comparing the time consumption in Fig. 3. The amount of
To verify the correctness of the proposed scheme, we compare the
Applications
In this section, we present some potential practical applications of the proposed scheme. Since
In 2011, Raffaele Cappelli et al. once proposed a new
Conclusion and Future Work
In this study, we proposed a secure and verifiable scheme to outsource the classical
With the rise of machine learning, more and more devices, such as individual phones and edge devices, need to do complex machine learning operations. Although with the development of technology, the performance of personal devices and edge devices is getting higher and higher, the volume and dimension of processed data are also increasing. Therefore, cloud-aided machine learning is an idea to solve this problem. We can outsource part of the complex computations of machine learning algorithms to enable resource-constrained devices to complete complex machine learning tasks on the basis of ensuring the privacy of client’s data. This not only requires cloud servers to be able to run part of machine learning algorithms’ computations but also needs to run on encrypted data. Now, there are a lot of studies on this problem, such as homomorphic encryption techniques. However, a serious problem is that since the computing is done on resource-constrained devices, the encryption algorithm should not be too complex, and the results returned by the cloud service should be able to enable the client to recover the data correctly. Therefore, based on these considerations, based on the research of others, we designed an outsourced
In future work, we plan to add federated learning and multi-party computation mechanism to the proposed scheme, bring more devices into the scope of fast ANN query, and explore a more secure, efficient, outsourced, practical and multi-party ANN scheme.