Introduction
As 5G network infrastructures are being deployed, with a more pervasive growth expected in the next few years [1], both academy and industry are now focusing on 6G/NextG to fulfill the requirements of applications of the next decade. Indeed, in many scenarios the limitations of 5G networks are evident in terms of data rate, latency, global coverage, etc. [2]. Applications such as extended reality, holographic communications, and digital twin will leverage the deployment of 6G network infrastructures to fully achieve their potentials [3].
Among the many benefits, 6G networks will provide extreme capacity, reliability, and efficiency. To achieve these challenging performance targets, it is expected that 6G networks deploy intelligent operations in both network orchestration and management [4]. Hence, along with network simplification exploiting Radio Access Network (RAN)-Core Network (CN) convergence, a key technology will be Artificial Intelligence (AI), enabling the transition from connected things to collective network intelligence [5], [6]. The advent of AI-driven functionalities in 6G will enable the deployment of proactive networks. These networks can perform operations in an autonomous way, such as self-management to maintain the desired network performance level, or self-protection to secure the network and deal with threats. Hence, 6G security vision has a tight integration with AI, leading to the paradigm of security automation [4]. Security design exploiting AI systems will become pivotal to autonomously detect and mitigate threats rather than current cryptographic methods [1].
Threat mitigation system, i.e., proactively recognizing and addressing potential dangers, thereby safeguarding protecting against unforeseen risks and vulnerabilities, will be the key element for enabling future networks in critical scenarios, such as military and banking applications. Additionally, the massive device connections to 6G networks will also pose new challenges to Denial of Service (DoS) attack detection, resulting in traditional DoS mitigation methods outdated [7], [8]. Subsequently, statistical and AI-based methods can cope with different types of malicious traffic [9], identifying, mitigating, and preventing these attacks.
Therefore, many works in recent years have focused on the possibility to build AI-based systems for defending wireless networks [10]. However, future networks will be characterized by heterogeneous devices and traffic, demanding more advanced classifiers. In the rapidly evolving landscape of network security, the integration of computer vision techniques for cybersecurity applications represents an opportunity, with the promise to enable sophisticated pattern recognition strategies. Indeed, similarities between DoS and computer vision technique lie in their shared purpose of complex pattern recognition. In computer vision, algorithms process visual information to recognize intricate patterns within images and video. This process involves many layers of abstraction, where lower layers detect basic features like edges, while higher layers mix these features to identify complex objects or scenes. Similarly, in the context DoS attack detection, network traffic analysis involves identifying anomalous patterns. This recognition of patterns aims at distinguishing normal network behavior from malicious activities. As in computer vision, effective DoS attack detection often requires the extraction of meaningful features from the network traffic, followed by classification or anomaly detection techniques to discern malicious behavior [11]. Hence, algorithms such as image retrieval and object shape recognition adapted from computer vision techniques can offer an effective solution to the threat identification challenge [12].
By converting network traffic data into matrix representations, computer vision techniques can be leveraged to extract meaningful patterns and features. Each network flow can be mapped into a pixel grid, where various attributes like source and destination addresses, ports are encoded. This transformation allows viewing network traffic as visual patterns, enabling the application of Convolutional Neural Networks (CNNs) and other image-based algorithms for analysis. As discussed in [13], [14], [15], leveraging the transformation of network traffic to images not only facilitates efficient and real-time data processing, but also enables the use of pre-existing image analysis tools, opening up new possibilities for enhancing network security.
Hence, in this work we firstly describe how packets, exploiting their temporal relationships, can be transformed to images ready to be used as inputs to computer vision algorithms. Then, we study the performance of this approach exploiting both well-known CNN architectures and a purpose-built CNN architecture, called afterwards customized-CNN in an intrusion detection problem, exploiting a 5G dataset. Differently from most implementations to date, the transformation of network traffic to images is done directly on raw packets, which can be directly collected at the base station, enabling a truly real-time system protection. This features is essential in a system amenable to future networks, complying with 6G requirements on latency and alleviating the DoS attack damage, since a threat can be identified quickly and as close as possible to where it is generated. Moreover, immediate detection holds great importance in this scenario due to the projected expenses associated with service interruptions [16]. Consequently, an on-site solution at the base station level that can effectively detects threats in real-time becomes of essential importance for the future of 6G/NextG wireless networks.
Related Works
Network security has been one of the prime concerns in 5G networks to provide increased user privacy, new trust and service models and enable the support for Internet of Things (IoT) and mission-critical applications [17], [18]. Network protection must be strengthened and enhanced for the safe deployment of different 6G verticals [19]. To overcome some of the additional security challenges imposed by novel network architectures, researchers have focused on novel approaches suitable for 6G networks. Deep Learning (DL) systems have been showing promising results in threat mitigation [20] thanks to their capability of extracting high-level features.
For example, in [21] an Intrusion Detection System (IDS) is developed based on CNN, capable of performing classification on statistics extracted from complete traffic flows of the CIC-IDS 2018 dataset [22]. The proposed solution is compared with a Recurrent Neural Network (RNN) model, showing the advantages of the feed-forward model over its recurrent counterpart. Although the architecture seems promising, an important limitation hampers its deployment in future network infrastructures: the training of the AI model is performed on statistics extracted from traffic flows; this approach is not suited to work on real-time traffic due to the need to wait for complete traffic flows at the base station.
Another work that exploits Deep Neural Networks (DNNs) for an IDS is proposed in [23]. The authors carried out a comparative study of IoT IDS with three DL models: DNN, Long Short-Term Memory (LSTM), and CNN. It is shown that DL models outperform the other methods applied in IoT IDS environment. The study only focuses on the CIC-IDS 2017 dataset [22], which cannot be considered as a good benchmark for a 5G/6G scenario because the dataset has not been collected in a real 5G network and thus the packet characteristics, e.g., packet inter-arrival time, can be very different with respect to the ones of a mobile network. Furthermore, the authors use the csv format of the dataset, i.e., statistics extracted from complete traffic flows, again hampering the possibility to deploy such systems in a real-time environment.
Tailored to specific 5G datasets, both works in [24], [25], deal with traffic classification. The first works on features extracted from complete traffic flows, hampering its exploitation on 5G/6G scenarios. Concerning the latter, a PCAP-to-Embeddings techniques is proposed, where Long Short-Term Memory Autoencoders are used for embeddings generation followed by a Fully-Connected network for classification purposes.
At the border between computer vision techniques and DoS traffic detection, authors in [26] propose to exploit ResNet architecture to detect malicious packets. Results are obtained on the CICDDoS2019 dataset [27], which, although being recent, does not resemble 5G/6G traffic characteristics, such as packet inter-arrival times. Furthermore, the authors consider only ResNet as a benchmark, not exploiting other computer vision architectures.
Another interesting work in the context of computer vision techniques applied to network traffic is [11], in which the authors discuss a multivariate correlation analysis technique to accurately represent the network traffic records and convert them into corresponding images. The detection system is developed based on Earth Mover’s Distance (EMD), a widely used dissimilarity measure. EMD considers cross-bin matching, resulting in a more precise evaluation of the dissimilarity between distributions compared to other dissimilarity measures like Minkowski-form distance
In contrast to all the aforementioned works, the research proposed in this paper differs in many aspects. First of all, we investigate multiple computer vision architectures, allowing us to explore a broader spectrum of possibilities in our investigation. Through the exploitation of preprocessing techniques, we discuss how a real-time transformation of network packets into images is practically possible. Furthermore, the utilization of a very recent dataset [24] collected within a 5G environment and never exploited with computer vision techniques allows us to set a first benchmark for future studies.
Proposed Architecture
In this section, we first describe how network packets can be transformed into images, with a focus on the used features and the corresponding preprocessing techniques. Then, we give an insight on how the proposed method can be implemented in a next generation eNB (gNB), showing its amenability to future 6G/NextG wireless infrastructure.
A. From Network Traffic to Images
The packet represents the basic unit of data transferred over a computer network. Each packet contains a part of the complete message and embeds information that helps identifying the traffic flow. The latter can be identified by a 5-tuple composed of source and destination IPs, source and destination ports, and protocol used.
In this work, relying on the concept of network traffic flow, an encoding scheme to translate packet attributes into a structured format, i.e., matrices, is proposed. By structuring the input as packet matrices, we create a spatial data representation. This representation enables the Neural Network (NN) to learn the traits of both DoS attacks and benign traffic by employing convolutional filters that slide across the input, identifying crucial patterns. Network traffic classification leveraging CNNs allows us to exploit one of their main advantages: the ability to identify DoS patterns irrespective of their temporal occurrence in the data. This intrinsic quality, i.e., producing consistent outputs despite the location of patterns in the input, is one of the paramount features of CNN architectures [31].
Specifically, the approach consists in (i) identifying
A summary of the proposed method, capable of transforming packet flows into images, is depicted in Fig. 1.
From network packets to images. From each network flow several matrices can be obtained when N packets are collected. In addition, 0-padding is also used when
Concerning the features, those that are deterministic or similar, and hence can hamper the generalization of the NN models, have been excluded, such as IP addresses and TCP ports. A list of the exploited features with the corresponding preprocessing techniques is reported in Table 1.
B. Integration in Base Stations
In this section, we detail how the proposed architecture can be implemented in a future 6G base station.
In 5G networks, RAN and CN functions are strictly separated, due to the diverse protocols, interfaces, and management mechanisms. Consequently, achieving a unified, simplified network architecture integrating these components into a converged network proved challenging for 5G architectures. However, with the advent of evolving technologies and the transition to 6G networks, there is an unprecedented opportunity to rethink network architectures. The shift towards a converged RAN-CN architecture will enable the creation of a simpler, more efficient network infrastructure [6].
Hence, in future 6G networks, a new approach will be exploited providing more flexibility in network deployment, where the RAN and the CN functions can be converged in the same platform and optimized together according to the use-case requirements [6], [33], [34]. With a less strict separation between RAN and CN, each 6G base station can be equipped with functionalities coming from both CN and RAN, ultimately deploying a local CN on top of each node.
Among the novel Network Functions (NFs), the Network Data Analytics Function (NWDAF) [35] will assume a more prominent role within 6G networks, serving as a foundation for distributed network intelligence. Hence, each future base station can be equipped to host the NWDAF, offering on-demand data analytics to other NFs [36].
The NWDAF can be exploited for intelligent threat mitigation involving user data. It has the potential to gather User Plane Function (UPF) data emanating from the User Equipments (UEs) and feed this information into a DL system for the identification of malicious traffic. For instance, a threat identification and mitigation system can be implemented at the NWDAF by identifying and automatically dropping packets marked as malicious. This architecture allows the direct identification of potential threats at the base station level, alleviating the need to disseminate them throughout the network. This approach complies with the vision of placing security mechanisms as close as possible to the potential sources of threats. Furthermore, real-time detection is pivotal in this context, given the estimated cost of service disruption [16]. Thus, a base station-level solution capable of real-time threat mitigation is of significant importance for future NextG wireless networks.
The architecture of the proposed system is illustrated in Fig. 2. For conciseness, we report only the principal NFs used for this solution, i.e., UPF responsible for data forwarding, routing, and quality of service (QoS) enforcement, Session Management Function (SMF) involved in the establishment and management of the UPF and the session of the UE and NWDAF.
Architecture of the proposed system, reporting the main NFs used. The NextG base station is deployed along with a local CN. The proposed technique utilizes NN implemented inside the NWDAF to process packets acquired from the UPF. Malicious Packets can be identified and dropped directly at the local NWDAF.
In detail, the NWDAF will perform the following tasks:
Data collection: the NWDAF collects all network traffic flows coming from the UEs connected to the base station.
Data preprocessing: for each packet, the NWDAF extracts the features and normalizes them, as described in Table 1. If
packets are not collected withinN seconds, then it pads the matrix with 0s.T Classification: Once the matrix is ready, the NWDAF is responsible for passing the sample to the NN, deployed along the local CN. The NN architecture can be both user-defined or rely on well-known computer vision models, as discussed in the next section.
Methodology
In this section, we first describe the dataset used for the experiments, highlighting its amenability to 6G/NextG wireless networks. Then, we briefly review the state-of-the-art computer vision architectures that have been exploited. Finally, the experiments carried out are introduced and results are presented.
A. Network Intrusion Detection
The accuracy and the efficiency of an Machine Learning (ML)-based cybersecurity system heavily depends on the quality of the dataset and how close the behavior of the data is to the behavior in a real network scenario. One of the problem in AI-based security research is the lack of a comprehensive dataset that resembles complex 5G/6G network behaviors.
The majority of the datasets available online are outdated for modern networks as they have been compiled before some critical technological evolutions, e.g., UNSW-NB 15 [37], CTU-13 [38]. Other recent dataset available on the Web, such as the CIC-DDoS2019 [27], presents limitations in terms of many redundant records/high class unbalance. Additionally, as mentioned in Section II, the behavior of 5G/6G networks is far from the testbeds or the simulation platforms used to create this dataset.
To overcome this problem, authors in [24] recently proposed 5G-NIDD, a network intrusion detection dataset generated from a real 5G test network. The dataset is collected using the 5G Test Network (5GTN) in Oulu, Finland. 5G-NIDD presents a combination of attack traffic and benign traffic under different attack scenarios. Real mobile devices attached to the 5GTN was used to generate traffic.
Data is extracted from two base stations, each connected to an attacker node and several benign 5G UEs. The attack scenarios include DoS attacks and port scans. Under DoS attacks, the dataset contains ICMP Flood, UDP Flood, SYN Flood, HTTP Flood, and Slowrate DoS. Under port scans, the dataset contains SYN Scan, TCP Connect Scan, and UDP Scan. The dataset is publicly available in both pcapng and csv formats. The pcapng format contains full packet payloads, while the csv files are a collection of statistics extracted for each traffic flow.
Hence, in this work, we exploit the 5G-NIDD to test the proposed architectures. A list of the attacks included in the dataset, with the corresponding description, is reported in Table 2. In the experiments, the attack type ICMP flood has not been considered since the number of samples for this class, once the dataset has been preprocessed into matrices, was very low. However, 9 classes are still present since the HTTP flood was performed using two different tools, Slowloris and Torshammer respectively.
B. Neural Network Architectures
In this section, we report the computer vision models that have been tested on matrices of traffic packets. In addition to state-of-the-art computer vision models, we have also designed and tested a customized-CNN, specifically aimed at recognizing threats.
One of the major innovative architecture used in computer vision is the Residual Network (ResNet) [39]. In order to solve the problem of the vanishing/exploding gradient, this architecture introduces the concept of Residual Blocks. As depicted in Fig. 3, instead of simply learning
The key concept of residual blocks relies on skip connections, as shown in Fig. 3, allowing smoother gradient flow and ensure that important features are carried until the final layers without adding computational load to the network. In our experiments, we rely on ResNet50V2, composed of 48 convolutional layers, one max pooling layer, and one average pooling layer.
Starting from residual connections, MobileNetV2 [40] exploits an inverted residual structure where the residual connections are between the bottleneck layers. This model, well suited to mobile devices, also exploits lightweight depthwise convolutions to filter features as a source of non-linearity in the intermediate layers. MobileNetV2 presents two types of blocks, a residual block with stride 1 and another block for downsizing with stride 2. These blocks are depicted in Fig. 4.
Two types of blocks for MobileNetV2: (a) residual block with stride=1 and (b) downsizing block with stride=2.
Aiming at increasing computational efficiency, authors in [41] propose EfficientNet, a more systematic method for enhancing accuracy and efficiency by scaling depth/width/resolution of CNN models. This is in contrast with conventional scaling methods that are based on random approaches, demanding manual tuning and significant effort. Specifically, the technique is based on a compound scaling method, that relies on a constant ratio to perform a balanced scaling of width, depth, and resolution.
Opposed to the computational efficiency of MobileNet and EfficientNet, DenseNet [42] connects each layer to every other layer in a feed-forward fashion, resulting in a high-demanding architecture in terms of computational resources.
An important milestone in the CNN architectures was the Inception Net [43]. The main idea behind this architecture is the Inception layer, a combination of layers with their output filters concatenated into a single output vector forming the input for the next layer.
Taking the principles of Inception to extreme, the Xception architecture is introduced [44]. In Inception,
In addition to these NN models, a customized-CNN has been specifically developed aiming at an accurate network packet classification, whose structure is reported in Fig. 5.
The CNN is made of 3 convolutional layers, with 8, 64 and 128 filters respectively. 3 Fully-Connected (FC) layers have been added, with 512, 128 and 9 units, respectively. As activation function, ReLU has been adopted for all layers, except for the last one that employs the SoftMax to perform classification. A summary of the state-of-the-art computer vision architectures that have been studied in the experiments along with the customized-CNN, is reported in Table 3. We can observe a large increase in parameters of the customized-CNN, mainly due to the exploitation of FC layers at the end on the convolutional section of the architecture.
The deployment of these models at the base-station can surely increase the power and computational resource consumption. However, advancements in hardware acceleration (like specialized chips or GPUs) [45] and optimization techniques [46] have significantly improved their efficiency. These advancements hence can enable quicker inference times and reduced energy consumption.
C. Experimental Setup
In this section, the experimental setup is highlighted, describing how classification experiments have been carried out.
Once the raw packets have been obtained, a script to transform pcapng files into suited input matrices is developed, as highlighted in Section III-A. The source code is publicly available at [47]. The script allows to define the maximum number of packets for each matrix,
Concerning the time window
Once the input matrices have been created, a normalization and padding phase has been performed. Data normalization is performed to scale values to a predefined uniform range. This prevents larger values from overwhelming smaller ones during the training process. A Min-Max scaling has been adopted: the min-max values for each feature have been searched through the entire dataset, and a rescaling has been carried out, resulting in all values belonging to the range [0, 1]. Then, for input matrices with less than
Finally, a
Since the training set is not balanced, a class weighting technique has been adopted. Leveraging this technique, it is possible to assign higher weights to minority classes, allowing the model to pay more attention to their patterns and reducing the bias towards majority classes. The weight for each class is given by:\begin{equation*} w_{j} = \frac {\#\text {samples}}{\#\text {classes}\times \#\text {samples}\_{}j}\end{equation*}
To evaluate the results of the experiments, we used the common evaluation metrics, such as the confusion matrix and the F1-score. Since the test set is kept unbalanced to resemble as much as possible real world data, accuracy metric can lead to skewed results and thus it is not considered. Instead, being the F1-score defined as the harmonic mean, or weighted average, of precision \begin{align*} \textit {F1-score}=&\frac {2}{\frac {1}{P}+\frac {1}{R}}\\ P=&\frac {TP}{TP+FP}; \quad R = \frac {TP}{TP+FN}\end{align*}
Results and Discussion
In this section, the obtained results are reported and discussed. Results for the different considered
Since the proposed system must be compliant with real-time requirements of future wireless networks, in the experiments we also evaluated the prediction time of the tested architectures for the entire test set exploiting an 11th Gen Intel Core i7-11700K @ 3.60GHz. Results are depicted in Fig. 6.
As reported in Table 4, the best performing model is the customized-CNN for all the values of
Concerning the other tested state-of-the-art computer vision architectures, only the ResNet and Inception models reach an F1-score above 90% for
A look at the confusion matrix of the best performing model, i.e., CNN, can give a better insight on the obtained results. The confusion matrices for
We can observe that for
If
However, increasing
Conclusion
6G/NextG networks will require intelligent threat mitigation systems to cope with different types of malicious traffic [9] and able to adapt to newly discovered threats. Hence, in this paper, we have explored the innovative approach of transforming network traffic packets into image representations and leveraging state-of-the-art computer vision architectures for classification.
In the experiments, an intrusion detection problem has been investigated with the goal of classifying normal and malicious behaviors. We have firstly discussed how raw packets can be converted in a real-time manner to input matrices ready to be fed into state-of-the-art computer vision architectures and the customized CNN.
Results have shown that complex models do not perform better than the developed CNN architecture. Indeed, for all the considered values of
The proposed system is just a first step on the application of computer vision techniques to network traffic analysis. Indeed, leveraging the exploitation of convolution-based models, more complex patterns can be discovered in packet matrices. For instance, a proactive approach, capable of identifying new types of attacks can be implemented. Additionally, a distributed learning approach, relying on federated/split learning techniques, will be considered in future works to enhance data privacy and model performance. Finally, hardware acceleration techniques could be studied to deploy these models on dedicated platforms, i.e., FPGA, offloading the computational workload from the gNB without compromising its performance.