Introduction
A. Preamble
In the last decade, the next-generation networks deployed on cellular networks(i.e., 5G and beyond) are undergoing a major revolution along with advanced telecommunication technologies for high-speed data transmission, high cell capacity, and low latency. Each network has its own focus, i.e., 5G: deliver higher multi-Gbps peak data speeds, ultra-low latency, 6G: embed artificial intelligence. NextG networks require a high-cost investment and research to meet infrastructure, computing, security, and privacy requirements. These technologies will enable the next data communications and networking era by connecting everyone to a world in which everything is connected. The main goal of those technologies is to support a wide range of new applications, such as Augmented reality (AR), Virtual reality (VR), metaverse, telehealth, education, autonomous and flying vehicles, smart cities, and smart grids, and advanced manufacturing. They will create new opportunities for industry to improve visibility, enhance operational efficiency, and accelerate automation [1]. It is expected that next-generation networks must simultaneously provide high data speed, ultra-low latency, and high reliability to support services for those applications [2]. Artificial Intelligence (AI) plays a crucial role in achieving these requirements by being integrated into applications throughout all levels of the network. AI is one of the key drivers for next-generation wireless networks to improve network applications’ efficiency, latency, and reliability [3]. AI is also applied to channel estimation applications, which is one of the fundamental prerequisites in wireless networks. The traditional channel estimation methods are extremely complex and low accurate due to the multi-dimensional data structure and the nonlinear characteristics of the channel. Therefore, DL-based channel estimation models have been used in next-generation networks to address the traditional channel estimation. However, DL-based channel estimation models can be vulnerable to adversarial machine learning (ML) attacks. A secure scheme is crucial for DL-based channel estimation models used in next-generation networks and security and vulnerability issues. DL-based models in the next-generation wireless communication systems should be evaluated before deploying them to the production environments in terms of vulnerability, risk assessment, and security threat.
B. Related Works
The main goal of NextG networks is to provide very high data rates (Tbps) and extremely low latency (less than milliseconds) with a high cell capacity (10 million devices for every square kilometer) [4], [5]. The key of the next-generation networks is to use new technologies, such as millimeter wave (mmWave), massive multiple-input multiple-output (massive MIMO), and AI. mmWave is essential for those networks, which provides a high capacity, throughput, and very-low latency in frequency bands above 24 GHz. Massive MIMO is an advanced version of MIMO, which includes a group of antennas at both the transmitter and receiver sides. This method provides better throughput and spectrum efficiency in wireless communication. AI-based algorithms have been used to improve network performance and efficiency. This study focuses on DL-based channel estimation models in next-generation wireless networks and their vulnerabilities. In the literature, these topics have already been studied with and without vulnerability concerns [6], [7], [8], [9], [10], [11]. The authors in [6] reviewed AI-empowered wireless networks and the role of AI in deploying and optimizing next-generation architectures in terms of operations. It indicated that AI-based models have already been used to train the transmitter, receiver, and channel as an auto-encoder. This allows the transmitter and receiver to be optimized mutually. The study also indicated that next-generation networks would differ from current ones, such as network infrastructures, wireless access technologies, computing, application types, etc. The authors in [12] reviewed DL-based solutions in next-generation networks, focusing on physical layer applications of cellular networks from massive MIMO, reconfigurable intelligent surface (RIS), and multi-carrier (MC) waveform. It also emphasized the AI-based solutions’ contribution to improving network performance. The authors in [13] and [14] proposed a robust channel estimation framework using the fast and flexible denoising convolutional neural network (FFDNet) and deep convolutional neural networks (CNNs) for mmWave MIMO. Both proposed methods can deal with a wide range of signal-to-noise ratio (SNR) levels with a flexible noise level map and offer better performance for channel estimators in terms of accuracy. DL-based algorithms significantly improve the overall system performance for next-generation wireless networks. Fortunately, several research groups in the wireless research community study the main potential security issue related to AI-based algorithms, i.e., model poisoning [15], [16]. The authors in [17] and [18] provided a comprehensive review of NextG wireless networks in terms of opportunities and security and privacy challenges, as well as proposed solutions for NextG networks. Several studies also present robust frameworks focusing on detecting adversarial attacks accurately. The authors in [19] proposed a framework to detect adversarial attacks for industrial artificial intelligence systems (IAISs), called DeSVig, i.e., decentralized swift vigilance framework. According to the results, the proposed framework can detect adversarial attacks, such as DeepFool and FGSM, with high accuracy and low delay. The authors also stated that the DeSVig framework provides better performance than current state-of-art defense approaches in terms of robustness, efficiency, and scalability based on experimental results.
C. Purpose and Contributions
The channel estimation is one of the most challenging topics in 5G and beyond networks due to the difficulties of finding the correlation between many resources, system parameters, and dynamic communication channel characteristics by using existing techniques. Therefore, sophisticated AI-based algorithms can help to model the highly nonlinear correlations and estimate the channel characteristics [20]. In our recent papers [21] and [22], adversarial attacks and mitigation methods have been investigated along with the proposed framework for mmWave beamforming prediction models in next-generation networks. This study provides a comprehensive vulnerability analysis of deep learning (DL)-based channel estimation models trained with the dataset obtained from MATLAB’s 5G toolbox for adversarial attacks and defensive distillation-based mitigation methods. It also implements widely used adversarial attacks from the Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Projected Gradient Descent (PGD), Momentum Iterative Method (MIM), to Carlini & Wagner (C&W) as well as a defensive distillation-based mitigation method for DL-based models. The results showed that DL-based models used in these networks are vulnerable to adversarial attacks, while the models can be more secure against adversarial attacks through the proposed mitigation method. The source code is available from GitHub.1
The scope of this study is limited to one of the 5G physical layer applications, i.e., DL-based channel estimation, its vulnerability analysis under selected adversarial attacks, and the proposed defensive distillation mitigation method. There are also other attack types, like the CW attack, which computes intensively and requires more iterations than traditional methods. In this study, we use a less compute-intensive and more efficient way to create adversarial examples.
Preliminaries
This section presents a brief overview of the channel estimation and the adversarial ML attacks, such as FGSM, BIM, PGD, MIM, and C&W, along with defensive distillation-based mitigation. Dataset description and scenarios are also given with the selected performance metrics to evaluate the models’ performance under normal and attack conditions.
A. Channel Estimation for Communication System
In a wireless communication system, the channel characteristic presents the communication link properties between transmitter and receiver. It is also known as channel state information (CSI). The signal is transmitted through a communication channel. i.e., medium, the transmitted signal is received as a distortion and noise added. It is needed to decode the received signal and remove the unwanted signal, i.e., distortion and noise added by the channel, from the received signal. To identify the channel characteristics is the first process to achieve that, which is called channel estimation process. The received signal is attenuated by a factor
It is assumed that \begin{equation*} y(t) = h_{0} * x(t - \tau _{0})\tag{1}\end{equation*}
However, the received signal comprises several reflected and scattered paths, i.e., multiple paths, with different attenuation and delay. The composed received signal is shown as:\begin{equation*} y(t)=\sum _{l = 0}^{l}h_{l} * x(t - \tau _{l})\tag{2}\end{equation*}
The mobility causes Doppler frequency shift, i.e., the change in the wavelength or frequency of the waves as to the observer being in motion with respect to the wave source. Doppler effect plays an important role in telecommunications and computations of signal path loss and fading due to multi-path propagation. In addition, the channel characteristics, i.e., \begin{equation*} y(t)=\sum _{l = 0}^{l}h_{l}^{t} * x(t - \tau _{l}^{t})\tag{3}\end{equation*}
The channel estimation plays an important part in wireless communications for increasing the capacity and the overall system performance. There is a high demand for new wireless networks, higher data rates, better quality of service, and higher network capacity. Therefore, new promising technologies are needed to meet these requirements. A migration from Single Input Single Output (SISO) to Multiple Input Multiple Output (MIMO) antenna technology has started with NextG networks. The channel estimation is the core of next-generation communication systems, i.e., 5G and beyond, performed in different ways for SISO and MIMO approaches at the receiver side. The channel estimation algorithm can be classified into three main categories, i.e., blind channel estimation, semi-blind channel estimation, and training-based estimation [23]. The training-based estimation among them is widely used in communication systems. The general approach of the channel estimation is to insert known reference symbols, i.e., pilots, into the transmitted signal and then interpolate the channel response based on these known pilot symbols. The process works in the following steps: (1) develop a mathematical model to correlate the transmitted and received signals using channel characteristics, (2) embed a predefined signal, i.e., pilot signal, into the transmitted signal, (3) transmit the signal through the channel, (4) receive transmitted signal as a distorted and/or noise added through the channel, (5) decode the pilot signal from the received signal, (6) compare the transmitted and the received signals, and (7) find the correlation between the transmitted and the received signals.
There have been many efforts regarding channel estimation algorithms using different approaches in the literature. However, it is still a challenging problem due to the computational complexity degree of algorithms and an enormous amount of mathematical operations, and channel estimation accuracy at low. The equalization method is typically used to reduce the complexity and render the frequency response at the receiver side [24]. With the introducing the machine learning methods to 5G and beyond communication systems, the performance of the channel estimation algorithm has been improved in terms of the degree of low computational complexity and channel estimation accuracy compared to conventional channel estimation algorithms [25]. In addition, the nature of deep learning-based algorithms can also save a significant computational power for complex analysis needed in channel estimation algorithms [26]. However, it can still be questionable of the feasibility of using machine learning methods in channel estimation. The study [27] presented several deep learning-based channel estimation algorithms, i.e., fully-connected deep neural network (FDNN), Convolutional Neural Network (CNN), and bidirectional long short-term memory (bi-LSTM), with different scenarios of fading multi-path channel models for 5G networks. According to the results, three presented deep learning-based algorithms reduced the channel estimation error and bit error ratio and were robust to the changes in the Doppler frequency. However, bi-LSTM among them provided the most significant reduction in channel estimation error. The authors in [28] also proposed a CNN combined with a projected gradient descent algorithm to demonstrate the feasibility of using machine learning methods in channel estimation.
A channel model is a representation of the channel that a transmitted signal follows to the receiver. In the simulation environment, the channel model is typically classified into two categories, i.e, clustered delay line (CDL) model and tapped delay line (TDL) channel model. A CDL is used to model the channel when the received signal consists of multiple delayed clusters. Each cluster contains multipath components with the same delay but slight variations for angles of departure and arrival, i.e., MIMO. On the other hand, a TDL model is defined as simplified evaluations of CDL, i.e., non-MIMO evaluations or SISO. These channel models are defined well in the technical report released by 3GPP, i.e., the 3rd Generation Partnership Project [29]. According to this report, CDL/TDL models are defined in the frequency range from 0.5 GHz to 100 GHz with a maximum bandwidth of 2 GHz. For CDL/TDL models, five different channel profile models are constructed, i.e., A, B, and C for non-line-of-sight (NLOS) propagation, while D and E for line-of-sight (LOS) propagation. Power, delay and angular information are used to define CDL models, while power, delay, and Doppler spectrum information are used for TDL models in the technical report released by 3GPP.
B. Convolutional Neural Networks
The convolutional neural network (CNN) is a neural network that has shown to be very successful for image recognition [30], [31], [32]. Compared to the fully-connected neural network, CNN can extract all the information with a lower number of parameters. The main idea of CNN is that we can locate the structure of an image by the convolution operation. Suppose the image \begin{equation*} \mathbf {y} = \mathbf {W} \ast \mathbf {x} = \sum _{i=1}^{W} \sum _{j=1}^{H} \mathbf {W}_{i,j} \mathbf {x}_{i-s,j-s},\tag{4}\end{equation*}
The CNN is composed of several types of layers. The convolution layer is the most critical layer of the CNN, consisting of several filters. Each filter extracts a particular type of feature from an input image. The pooling layer is a down-sampling layer, which reduces the size of the convolution output. Each pooling operation replaces several adjacent values with the maximal value or the mean value. The fully-connected layer is a standard neural network layer that combines all the features extracted by the convolution layer. The softmax layer is a classification layer to classify the input data.
The input image is a two-dimensional matrix. The filter in the convolution layer extracts a particular type of feature from the input image. For example, the leftmost filter extracts horizontal lines, and the middle filter extracts diagonal lines. The output of the convolution layer is then sent to the pooling layer, which reduces the size of the data. The output of the pooling layer is then sent to the fully-connected layer, which combines all the features extracted by the convolution layer. The output of the fully-connected layer is then sent to the softmax layer, which classifies the data.
C. Adversarial Attacks
ML-based models are trained to automatically learn the underlying patterns and correlations in data by using algorithms. Once an ML-based model is trained, it can be used to predict the patterns in new data. The accuracy of the trained model is essential to achieving a high performance, which can also be called as a generalization. However, the trained model can be manipulated by adding noise to the data, i.e., targeted and non-targeted adversarial ML attacks. The adversarial ML attacks are generated by adding a perturbation to a legitimate data point, i.e., an adversarial example generated craftily input with a slight difference, to fool the ML-based models. In such attacks, the attacker does not change training instances and tries to make some small input instances perturbations to make this new input instance safe in the model’s inference period. The existing defenses and adversarial attacks for images can be applied to attack and defend on other fields [33], [34], [35]. The cleverly-designed adversarial examples can fool the deep neural networks with high success rates on the test images. The adversarial examples can also be transferred from one model to another model. There are various kinds of adversarial ML attacks, such as evasion attacks, data poisoning attacks, and model inversion attacks [36]. An evasion attack aims to cause the ML-based models to classify improperly the adversarial examples as legitimate data points, i.e., targeted and non-targeted evasion attacks. Targeted attacks aim to force the models to classify the adversarial example as a specific target class. Non-targeted attacks aim to push the models to classify the adversarial example as any class other than the ground truth. Data poisoning aims to generate malicious data points to train the ML-based models to find the desired output. It can be applied to the training data, which causes the ML-based models to produce the desired outcome. Model inversion aims to generate new data points close to the original data points to find the sensitive information of the specific data points. In this study, we focus on this kind of adversarial attack. Taking channel estimation CNN model as an example, here, we use \begin{equation*} \sigma ^{*} = \underset {|\sigma |_{p} \leq \epsilon }{\arg max}\,\,\ell (\omega,\mathbf {x}+\sigma,\mathbf {y})\tag{5}\end{equation*}
Figure 1 shows a typical adversarial ML-based adversarial sample generation procedure.
These adversarial attack types are given as follows.
1) FGSM
Fast Gradient Sign Method (FGSM): FGSM is one of the most popular and simplest approaches to constructing adversarial examples. It is called one-step gradient-based attacks. It is used to compute the gradient of the loss function with respect to the input,
Compute the gradient of loss function,
$\nabla _{\mathbf {x}}\ell (\mathbf {x},\mathbf {y})$ Add the gradient to the input data,
$\mathbf {x}_{adv} = \mathbf {x} + \epsilon \times sign(\nabla _{\mathbf {x}}\ell)$
2) BIM
Basic Iterative Method (BIM): BIM is one of the most popular attacks, which is called an iterative gradient-based attack. This attack is derived from the FGSM attack [39]. It is used to compute the gradient of the loss function with respect to the input,
Initialize the adversarial example as
$\mathbf {x}_{adv} = \mathbf {x}$ Iterate
times, where$i$ $i=0, 1, 2, 3,\ldots, N$ Compute the gradient of loss function,
$\nabla _{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y})$ Add the gradient to the input data,
$\mathbf {x}_{adv} = \mathbf {x}_{adv} + \epsilon \times sign(\nabla _{\mathbf {x}}\ell)$
3) PGD
PGD is one of the most popular and powerful attacks, which is called gradient-based attacks [40], [41]. It is used to compute the gradient of the loss function with respect to the input,
Initialize the adversarial example as
$\mathbf {x}_{adv} = \mathbf {x}$ Iterate
times, where$i$ $i=0, 1, 2, 3,\ldots, N$ Compute the gradient of loss function,
$\nabla _{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y})$ Add random noise to the gradient,
$\hat {\nabla }_{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y}) = \nabla _{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y}) + \mathcal {U}(\epsilon)$ Add the gradient to the input data,
$\mathbf {x}_{adv} = \mathbf {x}_{adv} + \alpha \times sign(\hat {\nabla }_{\mathbf {x}}\ell)$
4) MIM
Momentum Iterative Method (MIM): MIM is a variant of the BIM adversarial attack, introducing momentum term and integrating it into iterative attacks [42]. It is used to compute the gradient of the loss function with respect to the input,
Initialize the adversarial example
and the momentum,$\mathbf {x}_{adv} = \mathbf {x}$ $\mu = 0$ Iterate
times, where$i$ $i=0, 1, 2, 3,\ldots, N$ Compute the gradient of loss function,
$\nabla _{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y})$ Update the momentum,
$\mu = \mu + \frac {\eta }{\epsilon } \times \nabla _{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y})$ Add random noise to the gradient,
$\hat {\nabla }_{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y}) = \nabla _{\mathbf {x}}\ell (\mathbf {x}_{adv},\mathbf {y}) + \mathcal {U}(\epsilon)$ Add the gradient to the input data,
$\mathbf {x}_{adv} = \mathbf {x}_{adv} + \alpha \times sign(\hat {\nabla }_{\mathbf {x}}\ell)$
5) C&W
The C&W attack was proposed as a targeted evasion attack by Carlini and Wagner [43]. It is based on the idea of a zero-sum game. In a zero-sum game, the total amount of value in the game is fixed. The winner of the game gets all of the value, and the loser gets nothing. The C&W method is an iterative attack that constructs adversarial examples by approximately solving the minimization problem \begin{equation*} min_{x \in \mathcal {X}} \mathbb {E}_{y \in \mathcal {Y}} [f(x) - y]^{2}\end{equation*}
The most important difference between C&W and other adversarial ML attacks is that C&W does not require an
D. Defensive Distillation
Knowledge distillation was previously introduced by Hinton et al. [44] to compress the knowledge of a large, densely connected neural network (the teacher) into a smaller, sparsely connected neural network (the student). It was shown that the student was able to reach a similar performance as the teacher [44]. In the initial work, the knowledge distillation was used to solve a classification problem, which is also called the teacher-student framework. Papernot et al. [45] proposed this technique for the adversarial ML defense and demonstrated that it could make the models more robust against adversarial examples. The main contribution of this work was to introduce the knowledge distillation to the adversarial ML defense. Defensive distillation is an ML framework that can enhance the robustness of the model for classification problems. The first step is to train the teacher model with a high temperature (\begin{equation*} p_{softmax}(z, T) = \frac {e^{z/T}}{\sum _{i=1}^{n}e^{z_{(i)}/T}}\tag{6}\end{equation*}
\begin{align*} \mathcal {L}_{student}(T)=&\frac {1}{N} \sum _{i=1}^{N} \sum _{j=1}^{n} \mathbf {y}_{ij} \cdot \log p_{softmax}(z_{ij}, T) \\=&\frac {1}{N} \sum _{i=1}^{N} \sum _{j=1}^{n} \mathbf {y}_{ij} \cdot \log \frac {e^{z_{ij}/T}}{\sum _{i=1}^{n}e^{z_{ij}/T}}\tag{7}\end{align*}
\begin{equation*} \mathcal {L}_{teacher}(T) = -\frac {1}{N} \sum _{i=1}^{N} \sum _{j=1}^{n} \mathbf {y}_{ij} \cdot \log \frac {e^{z_{ij}/T}}{\sum _{i=1}^{n}e^{z_{ij}/T}}\tag{8}\end{equation*}
Deep learning approaches have been shown to perform exceptionally well for a wide range of computer vision tasks (e.g., image classification, object, and action detection, scene segmentation, image generation, etc.). However, deep neural networks (DNNs) require large amounts of training data, which is not always available for new tasks or domains. Several knowledge distillation methods have been proposed to address this issue that can train a smaller student network to mimic the prediction of a more extensive and accurate teacher network.
Distillation has been applied in the field of intelligent systems, such as knowledge-based and rule-based systems, to reduce the system’s size and improve the system’s performance by improving the quality of the system’s knowledge. The teacher and student models’ differences can be considered a form of regularization, which is crucial to prevent overfitting. The algorithm 1 shows the pseudocode of distillation.
Algorithm 1 Pseudocode of Distillation
Input: Dataset
Output: Trained student model
Initialize the weights of the student model
for
Randomly shuffle the dataset
for
Extract the
Forward propagate the sample
Compute the loss
Backpropagate the loss
Update the weights of the student model
end for
end for
return Trained student model
In a typical wireless communication system, the channel estimation is done by the base station with the help of pilot signals sent by the user equipment (UE) during uplink. And the base station sends pilot signals toward the UE, which acknowledges the estimated channel information for the downlink transmission. Network operators and service providers are responsible for running their operations properly and meeting their obligations to the customers and the public related to privacy and data confidentiality. However, the network operations can be vulnerable to machine learning adversarial attacks, especially 5G and beyond, due to using machine learning-based applications. In Figure 2, the training of the channel estimation prediction model (i.e., student model) is protected against adversarial ML attacks, and its use in base stations is shown in all its stages.
Dataset Description and Scenario
MATLAB 5G Toolbox provides a wide range of reference examples for next-generation network communications systems, such as 5G [46]. It also allows to customize and generate several types of waveforms, antennas, and channel models to obtain datasets for DL-based models. In this study, the dataset used to train the DL-based channel estimation models is generated through a reference example in MATLAB 5G Toolbox, i.e., “Deep Learning Data Synthesis for 5G Channel Estimation”. In the example, a convolutional neural network (CNN) is used for channel estimation. Single-input single-output (SISO) antenna method is also used by utilizing the physical downlink shared channel (PDSCH) and demodulation reference signal (DM-RS) to create the channel estimation model.
The reference example in the toolbox generates 256 training datasets, i.e., transmit/receive the signal 256 times, for the DL-based channel estimation model. Each dataset consists of 8568 data points, i.e., 612 subcarriers, 14 OFDM symbols, 1 antenna. However, each data point of the training dataset is converted from a complex (real and imaginary) 612–14 matrix into a real-valued 612-14-2 matrix for providing inputs separately into the neural network during the training process. This is because the resource grids consist of complex data points with real and imaginary parts in the channel estimation scenario, but the CNN model manages the resource grids as 2-D images with real numbers. In this example, the training dataset is converted into 4-D arrays, i.e., 612-14-1-2N, where N presents the number of training examples, i.e., 256.
Complex numbers are used in wireless communication technologies. The complex number system modifies and demodulates wireless signals in digital wireless communication. The most significant distinction between the real and complex number systems is that the complex number system contains more than one dimension. Adversarial ML attacks, on the other hand, use real numbers to enter the decision boundaries of the victim DL models, and the final malicious inputs are in the real number domain. Complex numbers are split into real and imaginary elements to solve this challenge. Table 1 shows the example dataset.
For each set of the training dataset, a new channel characteristic is generated based on various channel parameters, such as delay profiles (TDL-A, TDL-B, TDL-C, TDL-D, TDL-E), delay spreads (1-300 nanosecond), doppler shifts (5-400 Hz), and Signal-to-noise ratio (SNR or S/N) changes between 0 and 10 dB. Each transmitted waveform with the DM-RS symbols is stored in the training dataset and the perfect channel values in train labels. The CNN-based channel estimation based is trained with the generated dataset. MATLAB 5G toolbox also allows tuning several communication channel parameters, such as the frequency, subcarrier spacing, number of subcarriers, cyclic prefix type, antennas, channel paths, bandwidth, code rate, modulation, etc. The channel estimation scenario parameters with values are given for each in Table 2.
The training dataset is split into validation and training sets to avoid overfitting the training data. The training set is used to train and fit the model, while the validation data is used for monitoring the performance of the trained neural network at certain intervals, i.e., 5 per epoch. The training is expected to stop when the validation loss stops decreasing and improving the model. In this study, most part of the dataset is used for training, i.e., 80% for training, and 20% for testing.
Simulation Model, Settings and Performance Metric
A. Simulation Model
Figure 3 shows the CNN-based DL model used in this paper for the channel estimation. The input to the model is the pilot signals with different subcarriers and OFDM symbols. The input is first passed through a convolutional layer, followed by a max-pooling layer. The output of the max-pooling layer is then passed through a fully connected layer, followed by a softmax layer. The final output of the model is the channel estimation.
We use the channel estimation dataset described in Section III to train the model. We use five different attacks (i.e., FGSM, BIM, MIM, PGD, and C&W) to evaluate the proposed mitigation methods. The deep learning-based channel estimation model is trained in the TensorFlow environment. The proposed mitigation methods are implemented in the Keras environment. The MSE performance metric is used to evaluate the accuracy of the channel estimation model.
B. Simulation Settings
The teacher and student models are DNNs with 3 convolutional layers. They are trained using stochastic gradient descent with a momentum of 0.9 and a learning rate of 0.001 for 100 epochs. The batch size is set to 256. Table 3 shows the DL model parameters.
Figure 3 shows the architecture of the teacher and student models.
The models are generative and supervised models trained to predict channel parameters defined at the receiver. The input and output size is
Figure 4 shows the training history of all three models.
C. Performance Metric
The performance metric, MSE (Mean Squared Error), is used to evaluate and compare CNN-based models. The MSE scores are utilized for further analyses of the model. MSE equation is given below. It measures the average squared difference between the actual and predicted values. The MSE equals zero when a model has no error. The model error increases along with the MSE value.\begin{equation*} MSE = {\frac {\sum ^{}{(Y_{t} - {\hat {Y}}_{t})}^{2}}{n}}\tag{9}\end{equation*}
Evaluation and Performance Results
This section provides the experimental results to evaluate the proposed defensive distillation-based mitigation method for DL-based channel estimation models in next-generation networks. We applied the attack success ratio (ASR) as the performance metric. ASR is the ratio of test samples that an attacker can mispredict to the total number of test samples. The highest ASR indicates that the attack is more effective. The following equation is used to calculate ASR:\begin{equation*} \text {ASR} = \frac {1}{m}\sum _{i=0}^{m}{\frac {MSE(\mathbf {x}^{adv}_{(i)},\mathbf {y}_{(i)})-MSE(\mathbf {x}_{(i)},\mathbf {y}_{(i)})}{MSE(\mathbf {x}^{adv}_{(i)},\mathbf {y}_{(i)})}}\tag{10}\end{equation*}
Table 5 shows the initial prediction performance results of all models with the test dataset.
The first experiment is to perform attacks on the undefended model, as shown in Table 6.
The results of the first experiment show that the initial DL model is vulnerable to adversarial ML attacks. As expected, the ASR value has a positive correlation with
Experimental results for the proposed defensive distillation-based mitigation method are shown in Table 7.
The experimental results show that the proposed method can improve the accuracy of the channel estimation model. The results also show that the proposed method can provide better results for the attacks (i.e., FGSM, BIM, MIM, PGD, and C&W).
Figure 5 shows the MSE results with 6 different
Experimental results for the proposed defensive distillation-based mitigation method. The results show that the proposed method can improve the accuracy of the channel estimation model.
Figure 6 shows the MSE change with different
Discussion
This study provides a comprehensive analysis of the DL-based channel estimation model in terms of vulnerabilities. The model’s vulnerabilities are studied for various adversarial attacks, including FGSM, BIM, PGD, MIM, and C&W, as well as the mitigation method, i.e., defensive distillation. The results show that CNN-based channel estimation models are vulnerable to adversarial attacks, i.e., FGSM, BIM, MIM, PGD, and C&W. The attack success ratio is also pretty much high, i.e., 0.9, under a higher power attack (
Observation 1:
The DL-based channel estimation models are vulnerable to adversarial attacks, especially BIM, MIM, and PGD.
Observation 2:
BIM, MIM, and PGD attacks are the most successful attack success rate.
Observation 3:
The DL-based channel estimation models are more robust against C&W attacks.
Observation 4:
A strong negative correlation exists between attack power
and the performance of channel estimation models.$\epsilon $ Observation 5:
The proposed mitigation method, i.e., defensive distillation, offers a better performance against adversarial attacks.
Conclusion and Future Work
Mobile wireless communication networks are rapidly developing with the high demand and advanced communication and computing technologies. The last few years have experienced remarkable growth in the wireless industry, especially for NextG networks. This paper provides a comprehensive vulnerability analysis of deep learning (DL) based channel estimation models for adversarial attacks (i.e., FGSM, BIM, PGD, MIM, and C&W) and defensive distillation-based mitigation methods in NextG networks. The results confirm that the original DL-based channel estimation model is significantly vulnerable to adversarial attacks, especially BIM, MIM, and PGD. The attack success rate increases under a heavy adversarial attack (