Journals & Magazines >IEEE Open Journal of the Comm... >Volume: 6

FLSN-MVO: Edge Computing and Privacy Protection Based on Federated Learning Siamese Network With Multi-Verse Optimization Algorithm for Industry 5.0

Abstract:

With the development of deep learning technology, artificial intelligence has important applications in all aspects of society, but the lack of data has become a vital fa...Show More

Topic: Special Issue on Industrial Communication Networks (ICNets) for Industry 5.0

Metadata

Abstract:

With the development of deep learning technology, artificial intelligence has important applications in all aspects of society, but the lack of data has become a vital factor restricting the further evolution of artificial intelligence in industry 5.0. Federated learning can effectively use the edge node data and solve the data problem of artificial intelligence model training by sharing gradient. In federated learning, the terminal transmits the updated model parameter values instead of the primordial data to the server, thus becoming a key technology to ensure data privacy and security in edge computing. However, since attackers can use the shared gradient to launch malicious attacks to steal users’ privacy, how to securely upload the gradient and aggregate it has become an important topic to ensure privacy security in federated learning. Therefore, this paper proposes an edge computing and privacy protection based on federated learning Siamese network with multi-verse optimization algorithm for industry 5.0. This scheme can reduce the expenditure of endpoint participation in federated learning while protecting user privacy. In the federated learning structure, the feature information of the input samples is mapped to a new output vector through the subnetwork of the Siamese network, and the approximation degree between the input samples is judged by comparing the similarity degree between the output vectors. The parameters of the network are optimized by multi-verse optimization algorithm to reduce the convergence time. Meanwhile, an adaptive weight aggregation algorithm is designed to reduce the degradation of model performance and stability caused by data quality differences, so as to improve the accuracy of the model and accelerate the model to reach the optimal value. Finally, comprehensive experiments on three public standard data sets show that the proposed method achieves higher model accuracy and faster model convergence than the most advanced methods.

Topic: Special Issue on Industrial Communication Networks (ICNets) for Industry 5.0

Published in: IEEE Open Journal of the Communications Society ( Volume: 6)

Page(s): 3443 - 3458

Date of Publication: 19 December 2024

Electronic ISSN: 2644-125X

DOI: 10.1109/OJCOMS.2024.3520562

Funding Agency:

No metrics found for this document.

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

I he relationship between Industry 5.0 and sustainability is receiving increasing attention, especially in the context of climate change and increasing global environmental challenges. It emphasizes the integration of sustainability principles into the latest industrial revolution for green, equitable manufacturing, as well as balancing economic growth and environmental stewardship through innovative practices for a sustainable future. Research for Industry 5.0 has gone through three phases since 2018, initially separated from Industry 4.0, with the latest priority being the deployment of circular manufacturing strategies supported by human-friendly digitalization that can anticipate and proactively respond to impacts. As such, Industry 5.0 is future-oriented and cross-cutting, interactively diverting from the original configuration of Industry 4.0.

The technology foundation of Industry 5.0 includes artificial intelligence (AI), Internet of Things (IoT), blockchain, etc. These technologies drive sustainable Industry 5.0 development. It provides a balance between growth and sustainability and highlights the upskilling and support needed to adapt to Industry 5.0.

In addition, Industry 5.0 highlights the vision of a more sustainable, resilient and people-centred industry, which includes respecting the boundaries of our planet and placing the well-being of industrial workers at the centre of the production process.

Although Industry 5.0 brings many bright visions of the new industrial age, it also faces some problems in the evolution of the Industrial Revolution. These challenges include security, privacy, human-machine collaboration, scalability, and skilled labor

In recent years, artificial intelligence technology has advanced quickly and has slowly become a part of every facet of the national economy for industry 5.0, such as smart factories, weather forecasting, intelligent driving and so on. The widespread use of AI has improved productivity and greatly facilitated People’s Daily lives, but the ubiquity of AI algorithms has also increased the risk of personal privacy breaches. For example, the smart assistant in the mobile phone can finish a series of tasks through the user’s voice command, but these operations may reveal the user’s hobbies, voice print and other sensitive information [1]. In addition, the illegal use of user data by service providers to train AI models can also lead to the disclosure of user privacy in industry 5.0 [2], [3].

To meet the privacy protection requirements in the use of artificial intelligence, Google proposed the concept of federal learning (FL) in 2017. The participants in federated learning include two roles, client and server. During each model training session, the client trains a sub-model locally and then uploads the sub-model to the client instead of the data [4]. The client aggregates the sub-models of all clients to get the final global model. This process not only uses all the local data, but also ensures that the user’s private data is not out of the domain. By sharing the gradient, the data can be “available and invisible” to ensure the compliance of the process of data use, so as to encourage more data holders to participate in model training, expand the data scale, and improve the model performance. After the concept of federal learning is proposed, it has been widely used in medical, financial and other fields, and to a certain extent, it solves the problem of less private domain data in artificial intelligence technology and privacy protection in the process of data use [5], [6].

Within the framework of edge computing, servers are positioned at the network’s edge, which substantially reduces the latency for transmitting computational tasks, making edge servers an optimal solution for task processing [7]. Nevertheless, edge servers have limited resources. As the mobile terminal count grows and application functionalities become more intricate, the waiting time for computational tasks queued up at edge servers also rises proportionally, which undoubtedly prolongs the overall computing delay [8]. Meanwhile, the process of transferring computing tasks from the mobile terminal to the server inevitably contains personal privacy information. When this sensitive data is stored by the server, users lose control of their own data. In short, edge computing optimizes task processing efficiency by deploying computing resources close to users, but the resource limitations of edge servers and the storage of users’ private data on servers together pose challenges to improve computing performance.

In a federated learning framework, all participants jointly run the same machine learning model, which means that the end devices work together to complete the model training task. This strategy significantly reduces the learning cost of terminal equipment, while improving the learning efficiency. Participants upgrade the parameters derived from the computational model without the need to transfer raw data to the server for consolidation, ensuring that user information is maintained exclusively on the local device, avoiding data backup on the server, and effectively protecting user privacy. Federated learning not only reduces the cost of learning for end devices and improves efficiency by allowing all participants to share the same model, but also effectively protects user privacy through local data storage and model parameter update mechanisms. At the same time, the framework addresses the challenges of data fragmentation and network instability through collaboration between servers and participants, as well as interaction between participants.

At present, federated learning has become one of the most popular directions in the field of artificial intelligence, and the research and application of federated learning have emerged in an endless stream. Wei et al. [11] proposed a personalized federated learning scheme to solve the problem of slow convergence caused by data heterogeneity in federated learning. Qin et al. [12] utilized federated learning to implement a recommendation system that guaranteed user privacy. Liu et al. [13] used quantitative technology in neural networks to reduce the model scale of federated learning and improve communication efficiency. Zhu et al. [14] proposed a non-interactive federated learning algorithm for privacy protection. This approach ensured that privacy was not compromised even when multiple federal learning participated collude. Lu et al. [15] discussed how to verify the correctness of server aggregate data on the premise of protecting user privacy, and proposed a verifiable federal learning privacy protection framework named VerifyNet based on the analysis of existing problem solving methods.

Conversely, enhancing communication efficiency and minimizing communication expenses have increasingly become focal points in contemporary research. Yin et al. [16] introduced an innovative mobile terminal that validates the current parameter updates based on the feedback from the central server, subsequently deciding whether to upload its updates to the server. However, in practice, the mobile terminal needs additional computation to assess if the extent of local updates correlates with global convergence. Li et al. [17] crafted a control mechanism that modulates the frequency of local updates and the instances of global aggregation. Nonetheless, this algorithm presupposes that each local update utilizes a uniform amount of resources, which is not the case in reality due to variations among terminals and data scales, leading to potential discrepancies in resource consumption for local updates and thus impacting their frequency. Li et al. [18] enhanced communication efficiency by accelerating the aggregation process of the FedAvg algorithm.

Although the original intention of federated learning is to solve the privacy problem in the process of multi-party participation in the training of artificial intelligence models, there are many literatures pointing out that there are still security vulnerabilities and privacy disclosure risks in federated learning. Federated learning is expected to ensure the data privacy of participants by uploading gradients instead of directly uploading data. However, subsequent studies [19] found that attackers could restore training data by reversing gradients, which increased the risk of privacy disclosure in federated learning. In addition, attackers can also pretend to be federated learning participants and conduct inferred attacks on the privacy of other participants through local and global models [20]. Therefore, privacy enhancement for federated learning has become an important research content of federated learning security.

The research on collaborative computing and privacy protection of edge intelligence based on federated learning applies federated learning mechanism to edge computing to alleviate the privacy leakage caused by edge computing for industry 5.0. And, the model also considers how to ensure the high quality of private data analysis, reduce the communication consumption of model aggregation, and improve the parameter updating efficiency in FL.

Our main contributions are as follows.

In the federated learning structure, the feature information of the input samples is mapped to a new output vector through the subnetwork of the Siamese network, and the approximation degree between the input samples is judged by comparing the similarity degree between the output vectors.
The parameters of the network are optimized by multi-verse optimization algorithm to reduce the convergence time.
Meanwhile, an adaptive weight aggregation algorithm is designed to reduce the degradation of model performance and stability caused by data quality differences, so as to improve the accuracy of the model and accelerate the model to reach the optimal value.

SECTION II.

Related Works

For artificial intelligence models, the size of training samples directly influences the accuracy of the final model, while most data holders do not directly provide data due to privacy protection and other considerations. This phenomenon is also referred to as “data silos”, where data cannot be effectively circulated to generate value due to privacy requirements. Federated learning uses participant data to complete training locally, upload models and aggregate to achieve collaborative AI training with “data available invisible”, accelerating the effective circulation of data elements [21], [22].

A. Federal Learning Principle and Classification

FL also was also named FedAvg algorithm. In FedAvg algorithm, there are two main roles: client and server. The client provides data to train the sub-model, and the server aggregates all the sub-models of the client to generate the global model. A typical FL process consists of the following steps:

Each client uses local data to train the submodel, and uploads the trained submodel to the server.
The server collects all the submodels sent by the client and aggregates them to produce the global model.
The server distributes the global model to all participants.

In this process, the data of the client is not transmitted, but the data privacy of the participants is guaranteed through the upload gradient. In the aggregation process, the FedAvg algorithm weights the average of the uploaded gradient according to the data proportion of each client, and finally gets the global model. Record the parameter from client $C_{K}$ as $w^{k}$ , where $k\in C_{t}$ . $C_{t}$ is a subset of the $m_{t}$ participating clients in $t-th$ round. For client $C_{K}$ , let its local training data set have $n_{k}$ data points, where $n_{k}=|P_{K}|$ . Therefore, the optimization problem in a federated learning environment can be redefined as: $\begin{align*} \min _{w\in R^{d}}f(w), f(w)\triangleq & \frac {1}{n}\sum _{k=1}^{m_{t}}\frac {n_{K}}{n}F_{K}(w). \tag {1}\\ F_{K}(w)=\triangleq & \frac {1}{n_{K}}\sum _{i\in P_{K}}f_{i}(w). \tag {2}\end{align*}$ View Source

Based on the heterogeneity of participants, in the field of FL transfer FL to reflect the characteristic differences of data. For these three different federated learning styles, it is mainly determined by the alignment position of data and features. If the data features involved in federated learning are consistent and the data items are inconsistent, that is, the precision of model training is increased by expanding the number of samples, then it is called horizontal federation. For example, banks in different regions conduct federation learning. Since the banking business is the same but the region is different, the samples are different but the characteristics are the same. If the data items participating in federated learning are consistent but the features are inconsistent, that is, the feature space of existing data is extended through federated learning, it is called vertical federation. For example, banks and insurance companies in the same region carry out federal learning, because the business is different but the region is the same, so the samples are the same but the characteristics are different. If the data items and features involved in federated learning are inconsistent, it is called transfer federation. For example, banks and insurance companies in different regions participate in federal learning, with different characteristics and data.

B. Safety Issues in Federal Learning

In deep learning, especially distributed deep learning, direct uploading of data for training will lead to the privacy of participants, so federated learning ensures the privacy of participants by uploading gradients. But it has since been shown that gradients can also lead to privacy breaches among participants for industry 5.0. In addition, due to the characteristics of multi-participant and multi-round communication in federated learning, federated learning faces greater security and privacy risks: the legitimacy of participants cannot be guaranteed, and attackers can masquerade as legitimate participants or launch attacks through monitoring channels.

There are two kinds of security problems for federated learning, one is caused by multiple rounds of communication in federated learning, and the other is caused by the incomplete trust of the identity of each participant in federated learning. The security problems caused by multi-round communication are mainly based on the traditional security perspective: in the transmission process of gradient, it is easy to be monitored, stolen or even modified by attackers. The security problems caused by the identity of the participants are related to the security native to deep learning, for example, the attacker masquerades as a legitimate participant and sends malicious gradients to destroy the model performance. In addition, most of the current research on the security of federated learning also specifies the nature of the server. It is generally believed that the aggregation server in federated learning is “honest and curious”, that is, the server will “honestly” execute the pre-set program, but will be “curious” about the content of the execution. For security issues in federation learning, this section introduces typical poisoning attacks and backdoor attacks.

Poisoning attack. Poison attack in the field of machine learning was first proposed by Biggo et al. Its attack mode is mainly implemented in the form of flipping data labels to destroy the performance of support vector machines. In federal learning, due to the large number of participants and the identity cannot be guaranteed, attackers can pretend to be legitimate participants and tamper with the uploaded gradient to achieve the effect of attack. In general, attack methods can be broadly divided into two categories: data poisoning and model poisoning, the main difference between them is the way the attack is carried out. Data poisoning mainly reduces the performance of the global model by modifying data information. For example, Gupta et al. [23] proposed a method of data poisoning by adding noise to an existing data set. Liu et al. [24] used Generative Adversarial Networks (GAN) to generate poison data for attacks. For model poisoning, attackers mainly tamper with the uploaded gradient to achieve the purpose of global model performance loss. Zhang et al. [25] proposed a method for the federal recommendation system to use public data to approximate the feature vector and further design a more covert poisoning gradient, so as to achieve the effect of poisoning attacks. Zhang et al. [26] used GAN to learn the characteristics of other benign gradients and generate a toxic gradient similar to the benign gradient. There is a special form of poisoning attacks in which the number of attackers is more than half. In this case, the more common defenses against poison attacks are not effective, and this situation is also known as the Byzantine problem in federation learning [27].
Backdoor attack. The main goal of a poison attack is to degrade the accuracy of the global model through malicious gradient or malicious data. Backdoor attacks, on the other hand, are specifically designed to degrade model accuracy on specific types of data without interfering with the overall performance of the global model. This type of attack can also be achieved by means of data poisoning or model poisoning. Therefore, in federation learning, many researchers classify backdoor attacks as special poisoning attacks. Backdoor attacks in deep learning were first proposed by Chen et al. [28], which could be implemented by injecting a small number of poisoning instances into neural networks. In the field of federated learning, Andreina et al. [29] discussed the possibility and potential defense means of backdoor attacks in federated learning, and proved the effectiveness of the attacks by experiments. Mouri et al. [30] proposed a poisoning attack scheme in the Byzantine style, which could still be effective under mainstream defense schemes such as Krum and Trimmed mean.

C. Privacy Issues in Federal Learning

The above is a simple summary of the security issues faced by federal learning. In fact, the problems faced by federal learning include privacy issues in addition to security issues. The security problem is mainly aimed at destroying the accuracy and other performance of the federal learning model, while the privacy problem is mainly aimed at obtaining all kinds of private information of the participants without destroying the accuracy of the model. The privacy problem of federated learning mainly comes from model inversion attacks [31], which can restore training data through gradient reverse training models for industry 5.0. However, in federated learning, the training process is completed on the client side, and the client completes the aggregation through the gradient uplink. This setting greatly increases the possibility of federated learning being subjected to model flipping attacks, resulting in a sharp increase in the privacy disclosure risk of federated learning. Attacks on data privacy in federation learning mainly include inference attacks and reconstruction attacks [32].

Inference attack. Inference attacks are attacks in which an attacker uses the model’s intermediate parameters or other model-related information to infer sensitive attributes about the user and the model. For example, Wen et al. [33] can infer whether a certain piece of data exists in the training set through inference attacks. As mentioned above, since the gradient information of federated learning is fully public, the possibility of inference attacks is increased. Xiong et al. [34] implemented inference attacks in federated learning and proved the potential privacy leakage risk of shared gradient.
Reconstruction attack. Reconstruction attack refers to the attacker using the model to obtain intermediate parameters or other model-related information to reconstruct training data. The idea of reconstruction attack comes from model inversion attack. The gradient leakage attack proposed by Wei and Liu [35] did not require any auxiliary data and additional training, and used an optimized way to directly recover training data from gradient data. Then Wang et al. [36] proposed a federation learning reconstruction attack method using GAN to reconstruct image data.

SECTION III.

Proposed Edge Computing System

A. System Architecture

In the coverage area of an edge computing server, a mobile terminal is used by each of M users. $D={d_{1},d_{2},\ldots , d_{M}}$ indicates the set of M mobile terminals. Where the $d_{m}$ indicates the $m-th$ mobile terminal. To fulfill the federal learning criteria or to achieve model convergence, a total of N global update cycles are necessary. The notation $R_{n}(1\leq n\leq N)$ denotes the collection of mobile devices that are engaged in the aggregation process during the $n-th$ round. $IMAX_{m,n}$ represents the maximum number of times that the $d_{m}$ iteratively runs the learning model locally when $d_{m}\in R_{n}$ .

The edge computing system architecture proposed in this study is shown in Figure 1. Commencing each global update cycle, the edge server identifies the terminals that will participate in that round and disseminates the model parameters to them. Concurrently, the server relays the list of participants to a third-party key generation service. This service generates a unique key for the participants and dispatches it to the respective terminals. As each participant undergoes the local update phase, they utilize their local data for learning. Upon completion of the learning process, they encrypt the updated model parameters locally and transmit them back to the edge server, utilizing the provided key. The server then consolidates the contributed data from all participants to formulate the model parameters for the subsequent learning round. This iterative process is repeated until either the predetermined limit of global aggregation rounds is exhausted or the federated learning model achieves convergence.

FIGURE 1.

The proposed edge computing system architecture.

Show All

B. Privacy Protection Mechanism Design

This research posits that the key generation platform functions as a trustworthy entity, with all devices, including terminals and edge servers, adhering to the federated learning protocol in their program execution. However, the study acknowledges the potential for devices to attempt unauthorized access to private data from other users, a scenario that is commonly accepted as a fundamental aspect of research into privacy preservation in federated learning [37]. Consequently, to safeguard against the upload of model parameters based on terminal data, other devices must be prevented from operating on local data using this uploaded information, and terminals should not be able to directly transmit locally updated parameters to the edge server. In essence, to uphold user privacy, it is crucial to ensure that no devices, other than the terminal itself, can access the local data or the parameters that have been updated locally. Simultaneously, privacy protection mechanisms must ensure that edge servers can still aggregate authentic model parameters. This section details the process through which the terminal encrypts the data it uploads using the provided key.

During the $n-th$ iteration of global updates, let $\sigma _{m,n}$ denote the data that the $m-th$ terminal intends to upload following its local update process. In the context of federated learning, the server employs a global aggregation algorithm defined as $f\left ({{\sum _{d_{m}\in R_{n}}\sigma _{m,n}}}\right)$ , which essentially means that the aggregation is a function of the collective data uploaded by all participating devices. Upon receiving the participant details from the edge server, the key generation service generates $|R_{n}|$ (where $|R_{n|}\gt 1$ ) distinct random values, assigning one to each participant. During the key distribution phase, the service not only provides the participant with their designated random value but also communicates the random value assigned to another participant. It is crucial to ensure that all random values are uniquely selected. These two random values constitute the cryptographic keys that the key generation service provides to the terminal. Utilizing $\sigma _{m,n}$ , the terminal incorporates its own random value and subtracts the other to compute the actual data, denoted as $\hat {\sigma }_{m,n}$ , that is sent to the edge server. Consequently, while the edge server can achieve accurate aggregation outcomes, it does not obtain the user’s update data directly.

SECTION IV.

Model Construction and Problem Definition

A. Model Analysis

Suppose $flag_{m,n}$ denotes that whether $d_{m}$ participates in $n-th$ round aggregation, which is calculated as follows: $\begin{align*} flag_{m,n}= \begin{cases} 1 & d_{m}\in R_{n} \\ 0 & otherwise \end{cases} \tag {3}\end{align*}$ View Source

Suppose that in $n-th$ round global aggregation, $d_{m}$ processes a sample with a period of $c_{m,n}$ . Terminal $d_{m}$ local data set has $s_{m}$ samples. The calculation time $d_{m}$ sample is: $\begin{equation*} pt_{m,n}=\frac {I_{m,n}c_{m,n}s_{m}}{f_{m,n}}. \tag {4}\end{equation*}$ View Sourcewhere $f_{m,n}$ represents the main frequency of $d_{m}$ , $I_{m,n}$ represents the number of actual cycles of $d_{m}$ in the $n-th$ round, and $I_{m,n}\leq IMAX_{m,n}$ ,

The energy consumption generated by running the model is: $\begin{equation*} pe_{m,n}=\frac {\alpha _{m,n}}{2}\times I_{m,n}\times c_{m,n}\times s_{m}\times f^{2}_{m,n}. \tag {5}\end{equation*}$ View Sourcewhere ${}\frac {\alpha _{m,n}}{2}$ represents the effective capacitance coefficient of the chip calculated by $d_{m}$ in the $n-th$ round [15].

Suppose that in the $n-th$ round, the size of the data transmitted between $d_{m}$ and the server is $l_{m,n}$ , and the channel bandwidth is $B_{m,n}$ , then the transmission time of $d_{m}$ is: $\begin{equation*} mt_{m,n}=\frac {l_{m,n}}{B_{m,n}}. \tag {6}\end{equation*}$ View Source

The transmission energy consumption generated by $d_{m}$ in $n-th$ round is: $\begin{equation*} me_{m,n}=\frac {l_{m,n}p_{m,n}}{B_{m,n}}. \tag {7}\end{equation*}$ View Sourcewhere $p_{m,n}$ represents the transmission power of $d_{m,n}$ in the $n-th$ round.

The time overhead of all terminals during the entire model is: $\begin{equation*} T=\sum _{n=1}^{N}\sum _{m=1}^{M}flag_{m,n}\left ({{pt_{m,n}+nt_{m,n}}}\right ). \tag {8}\end{equation*}$ View Source

During the communication exchange between the terminal and the server, the time taken for signal propagation through the channel is relatively brief; hence, this component of the timing can be disregarded for the purposes of time cost analysis in this research.

In the federated learning process, the energy consumption of all terminals can be expressed as: $\begin{equation*} E=\sum _{n=1}^{N}\sum _{m=1}^{M}flag_{m,n}\left ({{pe_{m,n}+me_{m,n}}}\right ). \tag {9}\end{equation*}$ View Source

B. Problem Definition

This research aims to protect the privacy of users in edge computing by means of federated learning, and reduce the time cost and energy consumption of mobile terminals on the premise of ensuring the learning accuracy, which is formalized as follows: $\begin{align*} min& E,T. \tag {10}\\ s.t.& \lambda _{m}\geq \Lambda _{m},m=1,2,\ldots , M. \tag {11}\end{align*}$ View Sourcewhere $\lambda _{m}$ represents the learning accuracy of $d_{m}$ after federated learning, that is, the sample accuracy of $d_{m}$ learning; $\Lambda _{m}$ represents the target learning accuracy of $d_{m}$ . Formula (11) indicates that the precision of each model after federated learning should not be less than the target learning precision, so as to ensure the learning precision of federated learning.

SECTION V.

Algorithm Design

As can be seen from Figure 1, the performance and efficiency of federated learning are significantly influenced by several key processes, which include the selection of participants, the execution of local updates, and the aggregation of these updates globally. Among them, participant selection determines which terminals are selected as participants, and decides whether a certain terminal should participate in the current round of federal learning. Local update determines the way the terminal runs the learning model and has a direct impact on its analysis accuracy. Global aggregation determines the learning model of participants in the next round, and the quality of aggregation directly affects the performance of participants such as time, energy consumption and learning accuracy.

A. Federated Learning Siamese Network

1) Feature Mapping

In conventional twin network algorithms, the training process necessitates the creation of data pairs that are both similar and dissimilar. The goal is to ensure that the output vectors for similar samples are closely aligned, whereas the output vectors for dissimilar samples are distinctly separated [38]. As the number of training samples increases, the number of similar and non-similar pairings that need to be constructed also increases significantly. As shown in Figure 1, in a complete graph (that is, any two samples have an edge between them to describe the similarity relationship between them), the correlation between the quantity of samples, denoted as nSam, and the time complexity $T(nSam)$ required to establish all analogous pairs of data adheres to the subsequent mathematical expression: $\begin{equation*} T(nSam)=nSam(nSam-1)/2=O\left ({{nSam^{2}}}\right ). \tag {12}\end{equation*}$ View Source

Here, the square relation is approximately satisfied between the number of samples nSam and the time complexity $T(nSam)$ of constructing all similar pairs. Since the sample data in practical application usually does not directly label the similar pair information, it often needs to construct the similar pair information for training before training. When the number of samples nSam is large, if the information training for any two sample data structures is similar, the required time and computing resource overhead will increase significantly. This does not match with the requirement of edge computing applications to reduce processing latency and meet the requirements of terminal device resource constraints. Although it is possible to train the information by sampling, only some samples are selected to construct similarity. However, this method will lose part of the sample information during training, which will affect the final training effect. Obviously, the sampling strategy selected for training will affect the final performance of the algorithm. In the face of practical problems, people need to consider the selection of appropriate sampling strategies for training. To solve this problem, we propose a feature mapping strategy for training.

First, we discuss the reasons why traditional twin network methods need to introduce similarity information for training. Normally, when twin networks are used for training, the subnetwork only extracts part of the features of the sample data for further processing and judgment. These features are not directly related to the category information. Take facial expression recognition as an example, the features extracted by the subnetwork may only be related to the specific data of the facial organ, so we need to help adjust the relevant parameters of the subnetwork through similar pairing information, so that the feature differences extracted from the sample data of the same category are as small as possible, and the feature differences extracted from the sample data of different categories are as obvious as possible.

In order to reduce the dependence on similar pairing information and realize the training of twin networks without constructing similar pairing information, we propose a strategy to add category information when mapping features. By adopting appropriate key feature mapping strategies, the features obtained after mapping samples belonging to the same category are as close as possible, while the differences between the features obtained after mapping samples of different categories are as significant as possible. In this way, even if there is no clear similar pairing information, the training effect and performance of the twin network can be effectively improved.

For instance, the mapping process can be followed by the application of One-hot encoding to generate the output feature vector. With One-hot encoding, given that there are C categories for the sample data, each sample will be assigned an output vector T of length C post-encoding. Ideally, during training, we aim for the vector T to exhibit a value of 1 at the position corresponding to the sample’s class, signifying a probability of 1 for the sample to belong to that class. Conversely, a 0 at any other position in the vector T indicates a probability of 0 for the sample to be associated with a different class. Suppose the two samples are $T_{1}$ and $T_{2}$ , and the Euclidean distance between them is Eu. The length of $T_{1}$ and $T_{2}$ is tn, and the $i-th$ element in $T_{i}$ is denoted by $T_{1}(i),i=1,2,\ldots , tn$ . We can calculate that $Eu=0$ when two samples belong to the same class and $Eu=1$ when two samples do not belong to the same class. If we use other metrics, such as Manhattan distance, in the similarity measurement stage, we can still get similar conclusions.

While the aforementioned feature mapping process appears to incorporate the class information of the sample, in reality, this class information is derived through the model’s processing of the sample’s feature vector. Consequently, the outcome of the feature mapping can, to a certain degree, mirror the intrinsic characteristics of the sample’s feature data. When doing similarity calculation, we should not only compare the case of the highest value, but also consider the value of other positions. In this paper, in order to simplify the similarity calculation process and facilitate the explanation, we only consider the case of the highest two digits of the mapping result. The specific similarity calculation process will be calculated in the next section. In the future, we can make further improvements to the above similarity calculation scheme, such as considering more numerical values at more locations comprehensively to further improve the accuracy of classification.

2) Similarity Measure

During the phase of similarity assessment, it is essential to evaluate the output vectors derived from the feature mapping phase to ascertain the degree of similarity between them. Initially, we examine the composition of the output vector to ascertain the most suitable method for measuring similarity. In the context of One-hot encoding, each bit within the output vector T signifies the likelihood of the sample’s association with a particular class. Typically, we focus on the highest value bit to denote the class information of the sample, but the values in other places in the code also contain the information of the sample to some extent. When calculating similarity, the values of these other locations should be properly considered to obtain more accurate results. In the commonly used Euclidean distance formula, we need to consider the influence of each bit value on the similarity measurement result, which will affect the final similarity judgment result to a certain extent. But this approach is susceptible to extreme values. Considering that the higher digits in the code tend to be more closely related to the sample, we design a new similarity measure. In this method, we only compare the higher values of the output vector T to get the final similarity measure. We first assume that the two output vectors that need to be compared in similarity are respectively output vectors $T_{1}$ and $T_{2}$ , and assume that the positions with the largest values in the output vectors $T_{1}$ and $T_{2}$ are respectively the $M_{1}$ and $M_{2}$ positions, and the corresponding positions are $T_{1}(M_{1})$ and $T_{2}(M_{2})$ . If $M_{1}=M_{2}$ , we specify that the final similarity measure is: $\begin{equation*} D=\sqrt {\left ({{T_{1}(M_{1})-T_{2}(M_{2})}}\right )^{2}}. \tag {13}\end{equation*}$ View Source

If $M_{1}\neq M_{2}$ , we stipulate that: $\begin{equation*} D=\sqrt {\left ({{T_{1}(M_{1})-T_{2}(M_{1})}}\right )^{2}+\left ({{T_{1}(M_{2})-T_{2}(M_{2})}}\right )^{2}}. \tag {14}\end{equation*}$ View Source

The obtained D is the similarity between the output vector $T_{1}$ and $T_{2}$ , which is used to obtain the final discrimination result in the decision discrimination stage.

B. Participant Determination Algorithm

Any $j-th~(1\leq j\leq s_{m})$ sample in the $d_{m}$ data set is composed of $x_{m,j}$ and $y_{m,j}$ . Where $x_{m,j}$ describes the characteristics of the sample, which serves as the input to the model. $y_{m,j}\in {-1,+1}$ represents the label of the sample and is the expected model output. The sample is considered to be linearly separable, and the Support Vector Machine (SVM) model is used to solve the typical binary classification problem. Therefore, the goal of each local update is: $\begin{align*} \arg \min & \frac {1}{2}||W_{m}||^{2}+C\sum _{j=1}^{s_{m}}\xi _{m,j}. \tag {15}\\ s.t.& \xi _{m,j}\geq 1-y_{m,j}\left ({{W^{T}_{m}x_{m,j}+b_{m}}}\right ). \tag {16}\\& \xi _{m,j}\gt 0. \tag {17}\end{align*}$ View Sourcewhere $W_{m}$ and $b_{m}$ represent current model parameters. $\xi _{m,j}$ is the introduced relaxation variable. Further converting the above formula as: $\begin{align*} \arg \min & J_{m}\left ({{W_{m}}}\right )=\frac {1}{2}||W_{m}||^{2} \\& {}+C\sum _{j=1}^{s_{m}}\max \left ({{0,1-y_{m,j}(W^{T}_{m}x_{m,j}+b_{m})}}\right ). \tag {18}\end{align*}$ View Sourcewhere $J_{m}(W_{m})$ is the final loss function [16]. Prior to each selection round, the edge server evaluates the analytical accuracy and the target accuracy for each terminal, opting to include only those that fail to meet the set accuracy benchmarks. When an endpoint’s analytical accuracy reaches the target accuracy, it will no longer be selected for future global updates. This allows each terminal to be as accurate as possible and reduces interference with other terminals.

C. Local Update Algorithm

For $d_{m}$ , if after each learning, its accuracy $\lambda _{m}$ does not exceed the target accuracy and does not reach the maximum number of iterations, $d_{m}$ needs to carry out the next learning. Let the model parameter of $d_{m}$ in the $n-th$ round be $w_{m,n}$ , and the determination of the parameter $w_{m,n}$ of the next learning is related to the learning parameters and learning accuracy of this time: $\begin{equation*} \hat {w}_{m,n}=w_{m,n}-\eta _{m}\nabla J_{m}\left ({{w_{m,n}}}\right ). \tag {19}\end{equation*}$ View Source

$\nabla J_{m}(w_{m,n})$ represents the decline gradient of the loss function when the parameter is $w_{m,n}$ . $\eta _{m}$ indicates the learning rate of $d_{m}$ in this learning. $\begin{equation*} \eta _{m}=2\eta _{m,max}\left ({{\frac {1}{1+e^{-\theta (\Lambda _{m}-\lambda _{m})}}-0.5}}\right ). \tag {20}\end{equation*}$ View Source

In this context, $\theta$ represents a constant value. The symbol $\eta _{m,max}$ denotes the maximum learning rate for the terminal $d_{m}$ . This approach to calculating the learning rate allows it to automatically adjust in response to the current accuracy of learning, thereby mitigating the adverse effects that can arise from either excessively high or low learning rates.

D. Parameter Optimization Based on MVO

1) Multi-Verse Optimization Algorithm (MVO)

Each universe in the MVO algorithm [39], [40] represents a feasible solution to the problem, and the variables in each solution are represented by each object in the universe. The expansion rate of the universe represents the fitness value of the solution. White holes and black holes have higher and lower expansion rates respectively. Matter in the universe moves between black holes and white holes with a certain probability through medium wormholes. The algorithm is described as follows: $\begin{align*} U= \begin{bmatrix} x_{1}^{1} & ~~ x_{1}^{2} & ~~ \cdots & ~~ x_{1}^{n} \\ x_{2}^{1} & ~~ x_{2}^{2} & ~~ \cdots & ~~ x_{2}^{n} \\ \vdots & ~~ \vdots & ~~ \cdots & ~~ \vdots \\ x_{n}^{1} & ~~ x_{n}^{2} & ~~ \cdots & ~~ x_{n}^{q} \end{bmatrix} \tag {21}\end{align*}$ View Sourcewhere q is the number of variables. n is the number of universes. $x_{d}^{b}$ is the $d-th$ cosmic b-dimensional component. $\begin{align*} x_{v}^{u}=\begin{cases} x_{k}^{u}, r_{1}\lt NI\left ({{U_{v}}}\right ) \\ x_{v}^{u}, r_{1}\geq NI\left ({{U_{v}}}\right ) \end{cases} \tag {22}\end{align*}$ View Sourcewhere $NI(U_{v})$ represents the normalized expansion rate of the $v-th$ universe $U_{v}$ . $x_{k}^{u}$ represents the u-dimensional component of the universe. $r_{1}$ is the random number of $[{0,1}]$ .

The black hole position update formula in the universe is as follows in (23), shown at the bottom of the page, $\begin{align*} x_{v}^{u}=\begin{cases} X_{u}+TDR\times \left ({{(ub_{u}-lb_{u})\times r_{4}+lb_{u}}}\right ), r_{3}\lt 0.5, r_{2}\lt WEP \\ X_{u}-TDR\times \left ({{(ub_{u}-lb_{u})\times r_{4}+lb_{u}}}\right ), r_{3}\geq 0.5, r_{2}\lt WEP \\ x_{v}^{u}, r_{2}\geq WEP \end{cases} \tag {23}\end{align*}$ View Source where $ub_{u}$ and $lb_{u}$ are the upper and lower bounds of variables respectively. $r_{2}$ , $r_{3}$ , and $r_{4}$ are random numbers of [0,1]. WEP represents the presence rate of wormholes. TDR represents the proportion of distance an object travels around the current optimal universe, calculated by the formula. $\begin{align*} WEP=& WEP_{min}+m\left ({{WEP_{max}-WEP_{min}}}\right )/M. \tag {24}\\ TDR=& 1-\frac {m^{1/r}}{M^{1/r}}. \tag {25}\end{align*}$ View Source $WEP_{max}$ and $WEP_{min}$ represent the maximum and minimum values respectively. m and M represent the current and maximum iterations, respectively. r represents the mining accuracy of the algorithm.

2) Modified MVO (MMVO)

The MMVO strategy in this paper includes subpopulation fusion initialization strategy, encoding and decoding strategy, NSGA-II mutation strategy and external file perturbation strategy. The MMVO process is shown in Figure 2.

Subpopulation fusion initialization strategy: In order to improve the optimization effect of the population and develop more high-quality search space, the subpopulation fusion strategy is designed to generate the initial population. The initial population is divided into four subpopulations of $N/4$ , which are represented by the symbols $pop1$ , $pop2$ , $pop3$ and $pop4$ , respectively. popl uses Monte Carlo method to generate population individuals, $pop2$ uses chaotic mapping method to generate population individuals, $pop3$ and $pop4$ use Latin hypercube sampling and random generation methods to generate population individuals, respectively. The four subpopulations are merged, crowded distance calculated and fast non-dominated sorted successively. Finally, the first N individuals are selected as the initial population, and the initial external file is formed.
Encoding and decoding strategies: In this paper, a three-layer segment encoding method is adopted for encoding operation: the upper layer encoding represents the feature order of parameter selection to be optimized, while the middle layer and the lower layer encoding represent the optimization order of parameters and the corresponding optimization network layer respectively. In order to more clearly represent the encoding strategy proposed in this paper, Figure 3 shows a specific encoding diagram. The parameter has four feature layers, and the processing sequence is F1-F3-F4-F2. F1 has three optimizations: 01, 03 and 04. F2 has two steps: 09 and 010. F3 has two operations, 05 and 06. F4 has a step of 07. The first 1 in the middle layer encode represents the first step in F1, the second 1 represents the second step in F1, and so on. The first 2 in the lower encode indicates that the first operation in F1 is processed on the layer 2 network, and so on. When decoding, the feature optimization sequence of parameters is extracted from the upper coding, the optimization sequence is obtained from the middle encoding, and the corresponding optimization layer is obtained from the lower encoding.
Embedded NSGA-II mutation strategy: In order to avoid the population falling into the local optimal solution, this paper embeds the variation mechanism of NSGA-II algorithm into the iterative population optimization on the basis of MVO algorithm operation. In order to balance the time efficiency and solution quality of the algorithm, a random number rand is set. When $rand\lt 0.5$ , NSGA-II mutation operation will be performed on the population updated by the MVO algorithm; otherwise, it will not be performed. The variation strategy is designed as follows:
- For index $C_{max}$ , select an individual sequence with the largest value of $C_{max}$ , randomly select $n/2$ parameter, and insert them into any position in the remaining parameter sequence in order to obtain a new individual sequence $\pi$ .
- For the TC index, select an individual sequence with the largest TC value, select the element in the singular position, and insert it into any position in the remaining parameter sequence after random order, to obtain a new individual sequence $\pi ^{\prime }$ .
- For the ZC index, select an individual sequence with the largest ZC value, select elements in the even position, and insert them into any position in the remaining parameter sequence in random order to obtain a new individual sequence $\pi ^{\prime \prime }$ .

FIGURE 2.

MMVO process.

Show All

FIGURE 3.

Three-layer encoding process.

Show All

Generating a random number r, if $r\lt 1/4$ , execute strategy a); Otherwise, if $1/4\lt r\lt 2/4$ , execute strategy b). If $2/4\lt r\lt 3/4$ , perform strategy c). Taking a) operation as an example, the variation diagram is shown in Figure 4.

FIGURE 4.

Schematic of mutation strategy (a).

Show All

E. External File Perturbation Strategy

After several iterations, the optimal solution set is stored in an external file. On this basis, the perturbation mechanism of external Archive is designed to search and develop new population individual sequences on the optimal solution set, and further improve the quality of the algorithm. For individuals in Archive, 1/2 are randomly selected to execute strategy (a) and the remaining individuals to execute strategy (b) in order to improve the degree of new search space development.

Embedded NSGA-II cross perturbation mechanism. This paper designed a combination sort delete cross policy (CSDC) to randomly select two individuals from the Archive and form a new population through CSDC operation. CSDC is divided into the following steps: First, two individual sequences are connected from end to end, and then random ordering is carried out on this basis; Secondly, duplicate elements that appear for the second time in each element are deleted in the sequence of new individuals after ordering updates, forming the final cross-over new individuals.
Mechanism of individual position disturbance in population. Perturbations were made to the individual positions of the obtained optimal solutions, and the perturbation method in [41] was adopted to update the location information to help the algorithm jump out of the local optimal solutions and expand the population diversity. The position update formula is: $\begin{equation*} P_{new}=P_{old}+sgn\left ({{r_{0}}}\right )r^{\prime }\left ({{x_{max}-x_{min}}}\right ). \tag {26}\end{equation*}$ View Sourcewhere $P_{old}$ and $P_{new}$ are the positions before and after individual renewal respectively. sgn is a function symbol. $r_{0}$ is the random number of [0,1]. $r^{\prime }$ is a random number generated by the normal distribution $N(0,R_{2})$ . $\begin{equation*} R=R_{max}-\left ({{R_{max}-R_{min}}}\right )(it/IT)^{2}. \tag {27}\end{equation*}$ View Sourcewhere $R_{max}$ and $R_{max}$ are the maximum and minimum values of the disturbance range respectively. it and IT are the current and maximum iterations, respectively.

F. Adaptive Weight Aggregation Algorithm

AW (adaptive weight) aggregation mechanism is an update strategy that dynamically adjusts model weights based on the model quality of each federated learning participant. In the AW aggregation algorithm, the historical accuracy rate of each participant’s model in the last iteration recorded by the blockchain is used as the basis for evaluating the data quality of the participant, so as to adaptively calculate the aggregation weight of the current iteration. At the same time, in order to ensure the stability of model training process and prevent gradient explosion and other problems, sigmoid activation function is introduced to improve the stability of gradient descent optimization algorithm. The expression of the sigmoid function is as follows: $\begin{equation*} sig(x)=\frac {1}{1+exp(-x)}. \tag {28}\end{equation*}$ View Source

After the completion of the t-round training, the aggregation server will call the smart contract to query $Acc_{i}^{t-1}$ by Algorithm 1, and normalize $Acc_{i}^{t-1}$ through the sigmoid function to get $\alpha _{i}^{t}$ , so as to determine the weight of each model in the round to participate in the aggregation. The normalized calculation formula of $\alpha _{i}^{t}$ is as follows: $\begin{equation*} \alpha _{i}^{t}=sig\left ({{\frac {Acc_{i}^{t}}{\sum _{i=1}^{N}Acc_{i}^{t}}}}\right ). \tag {29}\end{equation*}$ View Sourcewhere $\alpha _{i}^{t}$ is the aggregate weight of $t-th$ round of the $i-th$ participant.

When the adaptive weighting of each participant is completed, the aggregation server will complete the update aggregation of the parameters of the current round of model. The flow of an adaptive weight aggregate smart contract is shown in Algorithm 1.

Algorithm 1 Adaptive Weighted Aggregation Smart Contracts

Input:

Learning task id, t;

Output:

Parameters update aggregation results;

for $i=1,2,\ldots ,N$ do

Query the accuracy of each participant in the last round $Acc_{i}^{t-1}$ ;

$\alpha _{i}^{t}\leftarrow sig(Acc_{i}^{t})$ ;

end for

Update ${w_{i}^{t+1}}_{i=1}^{N}$ ;

Return ${w_{i}^{t+1}}_{i=1}^{N}$ ;

After the current round of model aggregation is completed, the aggregation server will send the updated model parameters to the participants participating in the aggregation, so as to complete the update of all local models.

G. Proposed FLSN-MVO Process

Figure 5 shows the flow of FLSN-MVO algorithm. The process begins with the edge server selecting participants for each round and then sending model parameters to those participants for local updates. In the course of the local update phase, the process is halted if the learning accuracy surpasses the predefined objective. Should the objective not be met, the necessary data is transmitted to the edge server. Once all participants have concluded their local updates, the edge server consolidates the parameters in anticipation of the subsequent round of global updates. This cycle persists until the designated limit of global update rounds is reached or until all terminals achieve the desired learning accuracy.

FIGURE 5.

FLSN-MVO process.

Show All

SECTION VI.

Experiments and Analysis

In this section, we have divided the experiment into two cases. In this experiment, the server used is configured with an i7-7800X CPU, NVIDIA GeForce RTX 2080 Ti GPU, 12-G RAM, and Linux operating system.

A. Experiment: Case 1

In the experiment, the data set used is the Vehicle data set, which includes acoustic, seismic and infrared sensor data collected by the sensor network. In order to achieve comparison, two additional federated learning algorithms are compared, as follows:

FedAvg: Each round of the server considers all terminals as participants, and when local updates are made, participants learn through SVM. In the parameter aggregation stage, the aggregation technique employed is a straightforward averaging approach. As a quintessential algorithm within the realm of federated learning, this method has frequently served as a benchmark in numerous scholarly works pertaining to federated learning.
q-FedSGD: The server randomly selects 15 terminals as participants in each round, and when local updates are made, participants only learn through SVM. In the parameter aggregation stage, the improved parameter aggregation algorithm is used. The purpose of this method is to ensure the accuracy of federated learning and reduce the error of learning accuracy of each terminal.
This section compares the target learning accuracy, the time of terminal participation in federated learning, and the energy consumption of the three algorithms (FLSN-MVO, FedAvg, and q-FedSGD) after federated learning. The efficiency of FLSN-MVO method is verified from these three aspects.
1. Comparative analysis of terminal learning accuracy: Referencing the data presented in Table 1 and Figure 6, one can discern the discrepancies in learning accuracy between each terminal and the target accuracy under the three distinct algorithms, thereby gauging the efficacy of each algorithm in maintaining the desired learning accuracy. Given that FedAvg is capable of ensuring learning accuracy in federated learning scenarios, this study adopts 99% of the learning accuracy achieved by FedAvg as the target accuracy for each terminal in the FLSN-MVO algorithm. Figure 6 illustrates that the majority of terminals in q-FedSGD fall short of the target accuracy. Conversely, in FLSN-MVO, only terminal $d_{5}$ does not meet the target accuracy, which suggests that the FLSN-MVO algorithm is effective in ensuring the learning accuracy of the terminals.
2. Comparative analysis of terminal learning time: This part compares and analyzes the time of each terminal participating in federated learning when FedAvg, q-FedSGD and FLSN-MVO algorithms are running. From the above analysis, it can be seen that the time of terminal participation in federated learning is composed of the computing time of running the machine learning model locally and the transmission time of uploading data. The computation time is proportional to the number of times the machine learning model is run, and the transfer time is proportional to the number of rounds involved in the global update. As shown in Figure 7 and Table 2, the FedAvg algorithm takes the longest time for all terminals because in this algorithm, all terminals participate in each round of global updates. Since q-FedSGD only selects some terminals as participants in each round, the learning time of terminals is smaller than that of FedAvg algorithm. At the same time, the algorithm randomly selects the terminals, resulting in the difference between the learning time of each terminal and that of FedAvg algorithm. In the FLSN-MVO algorithm, when the terminal reaches the target accuracy, it is no longer selected by the server. Therefore, overall, the learning time of the terminal is the least among the three algorithms. There are exceptions, however, where $d_{5}$ takes the same amount of time under FLSN-MVO as it does under FedAvg. Because $\lambda _{5}$ is always less than $\Lambda _{5}$ in all local updates, $d_{5}$ is selected in all federated learning rounds.
3. Comparative analysis of terminal energy consumption: In this section, we will evaluate the overall energy expenditure for each terminal engaged in federated learning across the FedAvg, q-FedSGD, and FLSN-MVO algorithms. Energy consumption is indicative of the participation cost in federated learning. For mobile terminals, energy consumption is a critical factor due to the constraints imposed by their physical dimensions and computational capabilities. Even if a federated learning algorithm can achieve superior learning accuracy, its practical adoption may be hindered if it is excessively energy-intensive.

TABLE 1 Accuracy error

TABLE 2 Time comparison/s

FIGURE 6.

Error between Learning accuracy and target accuracy.

Show All

FIGURE 7.

Time consumption.

Show All

As depicted in Table 3 and Figure 8, the cumulative energy consumption for the learning process of each terminal under the three algorithmsalgorithmsłFedAvg, q-FedSGD, and FLSN-MVOłis presented. This total energy consumption is categorized into two components: transmission energy and computational energy. Based on the established formulae, the computational energy is directly related to the computational time, while the transmission energy is directly related to the time spent on data transmission. Consequently, the distribution trends of energy consumption and the patterns of time allocation among the terminals under these three algorithms exhibit a consistent alignment.

TABLE 3 Terminal energy consumption comparison

FIGURE 8.

Terminal energy consumption.

Show All

B. Experiment: Case 2

The experiments in this section use the UNSW-NB15 dataset and the NSL-KDD dataset. The UNSW-NB15 dataset includes both real benign traffic data and complex traffic data containing a variety of novel attack methods. The dataset, created by Security LABS Australia using the IXIA PerfectStrom tool, consists of 43 category-tagged features, including one normal record and nine attack categories. The NSL-KDD dataset is an optimization of the KDD99 dataset, removing some of the redundant data and making the data more balanced so that different technologies can be evaluated more accurately. The dataset contains 4 exception types and 39 attack types. Each record contains 41 features and 1 category identifier.

In this section, three asynchronous federated learning schemes are compared, including weighted K-async federated learning (WKAFL) [42], time-weighted asynchronous federated learning (TWAFL) [43], and gradient scheduling with global momentum (GSGM) [44]. These schemes improve prediction accuracy by mitigating the effects of stale data and non-independent codistributed data.

This section evaluates the performance of the proposed schemes in different stale scenarios on the UNSW-NB15 and NSL-KDD datasets. Specifically, T (total number of clients) and K (number of clients) can be used to abstract the heterogeneous strength of the system. This section uses $T/K$ to measure the overall obsolescence of the gradient in the system. In the experiment, three obsolescence levels are considered: $10~(T/K {=}1000$ /100), $20~(T/K {=}1000$ /50), and $100~(T/K {=}1000$ /10). The experimental results are shown in Figure 9 and Figure 10, and the final prediction accuracy is summarized in Table 4 and Table 5.

Speed of convergence: It can be seen from the figure that FLSN-MVO converges faster than the other three schemes. As shown in Figure 9, in the UNSWNB15 data set, FLSN-MVO has obvious advantages in convergence speed compared with other schemes when the heterogeneity of $T/K=10$ and $T/K=20$ is slightly weaker, which is mainly due to the hierarchical scheme used in text. In the initial stage, the FLSN-MVO scheme uses the method of adaptive weight aggregation algorithm, so the features of the intrusion detection data set can be extracted quickly in the initial stage. The other three schemes will cut the gradient or screen and weight aggregate according to the gradient quality at the initial stage. Although the scheme will converge again when it enters the second level after the first-level training is stable, which will affect the final convergence speed, the initial rapid convergence will also help the intrusion detection system to quickly enter the working state in the complex environment of the Internet of Things. In the case of low isomerism, more effective features can be extracted because of more polymerization gradients, so better convergence can be achieved compared with other schemes. In the case of high isomerism, even if the gradient is not screened, due to the limited gradient of each round of polymerization, fewer features are extracted, so the advantages will not be particularly obvious in the strong isomerism scenario where $T/K=100$ . As shown in Figure 10, the above analysis can also be supported in the NSL-KDD data set, because the NSL-KDD data set has fewer features than the UNSW-NB15 data set, so this scheme presents less advantages in convergence.
Accuracy rate of training: As can be seen from the figure, FLSN-MVO has a better training effect than the other three schemes. As shown in Figure 9, when training with UNSW-NB15 data set, FLSN-MVO can achieve a high accuracy after convergence at the first level. After second level convergence, as shown in Table 4, the accuracy of FLSN-MVO can be increased by more than 0.096, 0.123 and 0.121 respectively when $T/K {=}10$ , 20 and 100. One of the main reasons is that at the second level, more factors are considered in gradient selection, others in gradient screening and weight calculation, some only consider the gradient quality, while the obsolescence is only used for learning rate, and some only consider the gradient obsolescence sum, and the learning rate does not change dynamically. FLSN-MVO not only considers the gradient quality, but also considers the obsolescence of the gradient, which can extract more accurate data features, which makes the model training direction will not produce a large deviation. At the same time, because dynamic weights are used in the second stage of aggregation, the influence of high-quality gradient is amplified and the training effect is better. Second, because the model of the other three federative learning schemes is mainly used for image classification, the feature dimension of image data is higher, and their schemes are more suitable for image tasks with high-dimensional features. This scheme is mainly used for intrusion detection, and the feature dimension of the data set is low, so the overall accuracy is higher than the other three schemes. In addition, as shown in Figure 10, when training with NSL-KDD dataset, the accuracy rate of FLSN-MVO can be increased by more than 0.074, 0098 and 0.117 respectively when $T/K {=}10$ , 20 and 100, which proves that this scheme has better performance in strong heterogeneous scenarios. In addition, FLSN-MVO can maintain its accuracy relatively stably after it converges at the first level, while other schemes will show varying degrees of accuracy drop, which is also because FLSN-MVO is in the second level of training

TABLE 4 Training Accuracy of Each Model With Different Staleness Settings For the UNSW-NB15

TABLE 5 Training Accuracy of Each Model With Different Staleness Settings for the NSL-KDD

FIGURE 9.

The accuracy of each model training under different obsolescence settings with the UNSW-NB15 dataset.

Show All

FIGURE 10.

The accuracy of each model training under different obsolescence settings with the NSL-KDD dataset.

Show All

SECTION VII.

Conclusion

In view of the low efficiency of terminal processing large amounts of data and the risk of privacy disclosure in edge computing, this study designed an edge intelligent collaborative computing and privacy protection mechanism based on federated learning Siamese network and multi-universe optimization algorithm for industry 5.0. At the same time, the proposed federated learning algorithm effectively reduces the terminal cost and improves the efficiency of federated learning. The experimental results show that compared with FedAvg method and q-FedSGD method, the FLSN-MVO method in this study significantly reduces the learning time and energy consumption of terminal federation learning, and at the same time ensures that the learning accuracy of each terminal reaches the preset standard.

In addition, considering the limitations of computing and storage resources at terminals, future research will further combine federated learning with edge computing to minimize the additional burden on terminals. At the same time, the FLSN-MVO method will be trained with additional data sets to verify its efficiency. In addition, the privacy protection mechanism proposed in this study will be further optimized, and a more perfect user privacy protection mechanism will be designed based on the federal learning framework.

Usage

Select a Year

View as

Total usage sinceDec 2024:98

Year Total:54

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Search for
Citations in
Google Scholar^®

References is not available for this document.

FLSN-MVO: Edge Computing and Privacy Protection Based on Federated Learning Siamese Network With Multi-Verse Optimization Algorithm for Industry 5.0

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works