Introduction
The ever-increasing interconnectedness of today’s digital landscape makes network security paramount. Sensitive information travels across these networks, making them attractive to hackers who exploit weaknesses. A critical component of modern cybersecurity strategies is the ability to identify and respond to anomalies in network traffic. This is where Intrusion Detection Systems (IDS) come in. IDS play a vital role in network security by detecting potential intruders. They operate across a broad spectrum, analyzing network or host system activity for malicious patterns. These patterns are typically identified using two main approaches: Signature-based IDS (SIDS) and Anomaly-based IDS (AIDS) [1]. Signature-based IDS (SIDS) also known as Rule-based IDS, identifies threats by comparing network activity and system events against predefined signatures of known attacks. These signatures capture specific malicious activities or behaviors discovered previously [2]. Anomaly-based IDS (AIDS), often called Behavior-based IDS, establishes a baseline for normal network or system behavior. It then identifies deviations from this baseline as potential threats.
Network Intrusion Detection Systems (NIDS) are specialized IDS designed to monitor network traffic. Their primary objective is to identify either known attack signatures or unusual patterns in traffic behavior [39]. Anomaly-based NIDS are particularly crucial for safeguarding computer networks against evolving cyber threats [33]. These systems excel at detecting deviations from typical network behavior, facilitating the detection of unknown attacks. Fig. 1 provides an overview of the NIDS concept.
Traditional methods struggle to efficiently detect intrusions in large networks and high-speed internet connections, often becoming too focused on normal patterns. With the growing risks and complexities in network security, research has shifted towards machine learning [43]. Methods like decision trees, random forests, support vector machines, artificial neural networks, and immunity theory have greatly improved the performance of Network Intrusion Detection Systems (NIDS) [3], [11], [13]. These methods have evolved over time, with recent research analyzing and comparing various approaches, including statistical, machine learning, and deep learning techniques [4]. While each method has its strengths and weaknesses, selecting and integrating them based on specific needs is crucial. Challenges remain, such as adapting to evolving cyber threats and comparing different techniques across domains [32]. Conventional machine learning is increasingly incorporating deep learning techniques [7], which have become vital for addressing major challenges in intrusion detection in modern network architectures.
Recent advancements in Network Intrusion Detection System (NIDS) research, especially within the domain of deep learning, have introduced a spectrum of methodologies designed to grasp temporal patterns effectively. This may be attributed to the proliferation of the Internet of Things (IoT) and advancements in highly complex digital ecosystems, such as smart cities [44]. These encompass diverse neural network architectures, including Recurrent Neural Networks (RNN) [1], Convolutional Neural Networks (CNN) [7], Long Short-Term Memory (LSTM) networks [22], and Gated Recurrent Units (GRU) [5], along with hybrid approaches [16], [19], [20]. Notably, innovative methods have been introduced, strategically combining autoencoders and adversarial techniques to address concerns related to class imbalance [21], [24], [27]. These approaches mark a significant step forward in enhancing the robustness of intrusion detection systems against imbalanced class distributions.
While there have been notable advancements in Network Intrusion Detection System (NIDS) research, particularly in deep learning, understanding temporal patterns remains a challenge [16]. A breakthrough in addressing this challenge is the Sequence to Sequence (Seq2Seq) model [29]. This model combines two separate models, encoders and decoders, into a unified framework, allowing for comprehensive utilization of temporal features. Seq2Seq models excel in generating and recognizing patterns [23], [25], offering promising solutions for NIDS by tackling persistent challenges encountered in earlier works.
To address class imbalance, researchers have explored hybrid approaches integrating autoencoders and adversarial techniques [22], [27], [29]. These methods strive to balance class representation in datasets, preventing biases towards the majority class in network traffic. It contributes towards strengthening intrusion detection systems, enhancing their accuracy, especially in addressing class imbalance [17].
Traditionally, the challenges of addressing class imbalances and incorporating temporal learning in Network Intrusion Detection Systems (NIDS) have been treated as separate issues in existing research [14], [18]. In this work, we aim to bridge this research gap by integrating sequence-to-sequence models and generative models, while incorporating advanced statistical memory strategies. The proposed approach represents a significant advancement by unifying these concerns into a single, comprehensive framework, with the goal of creating a more effective and robust NIDS.
The proposed methodology addresses research gaps by leveraging a Generative Neural Network that embeds both GRU and LSTM, along with the strategic integration of a Temporal Correlation Index (TCI) as a metric for anomaly detection threshold. Known as TMG-GRU-VAE, this method incorporates either gated recurrent units (GRU) or LSTM into variational autoencoders, effectively capturing normal temporal patterns in network traffic sequences. Our study demonstrates the superior performance of this methodology by comparing results with models both using and not using TCI.
The Temporal Correlation Index (TCI) is vital to our methodology for several reasons. Firstly, it helps us consider timing and event sequences in network traffic, improving our model’s ability to distinguish normal from abnormal behaviors. Secondly, TCI lets us set dynamic anomaly detection thresholds, crucial for capturing changes in network traffic. Finally, TCI’s inclusion allows for a comprehensive evaluation, showing its specific contribution to overall anomaly detection effectiveness. Our methodology represents the ability to handle temporal complexities in network data behavior.
This integrative methodology strives to make a meaningful contribution to the dynamic field of Network Intrusion Detection Systems (NIDS), offering a more comprehensive and efficient solution. Additionally, our work sheds light on the existing research needs for network intrusion detection systems and addresses their key limitations. Through our research, we aim to pave the way for advancements in anomaly detection and contribute to the ongoing evolution of NIDS.
Prior to comparing our proposed work, we conducted a comprehensive analysis of hybrid models using various RNN setups to assess their effectiveness in Network Intrusion Detection Systems (NIDS) on the CIC-IDS-2017 dataset. Our findings notably highlight their superior performance, particularly in recall and F1 score. On average, GRU or LSTM-based VAE outperforms other models, with GRU-VAE standing out by achieving the highest precision at 77-83.2%. Furthermore, it attains the best F1 score, reaching an impressive 88-89% for successful attack detection in the dataset.
The efficacy of our proposed methodology (TMG-GRU-VAE) has been empirically validated using publicly accessible CIC-IDS-2017 and CIC-IDS-2018 datasets, resulting in promising outcomes. The analysis reveals a substantial reduction in False Positives (FP) across all models, with improvements ranging from 7.2% to 12.9% for the CIC-IDS-2017 dataset and from 7.1% to 14.1% for the CIC-IDS-2018 dataset. This underscores the methodology’s capacity to identify anomalies within the datasets, emphasizing its significant impact on decreasing false positive rates. Notably, this experiment involves comparing results with and without the threshold for the Temporal Correlation Index (TCI) metric.
The following is a list of our key contributions:
Our research proposes a new method called TMG-GRU-VAE for Network Intrusion Detection Systems (NIDS). This method uses a special type of neural network (VAE) with built-in memory (GRU) to capture the temporal features of network traffic.
We also propose a new metric called TCI to identify unusual changes in the temporal behaviour of network traffic. Combining this metric with the proposed method in the second phase of anomaly detection with thresholds. This metric helps to identify even subtle changes in network traffic patterns.
We have evaluated our approach on most recent network traffic dataset CIC-IDS-2017 and CIC-IDS-2018 and has witnessed a significant reduction in false alarms, improving overall accuracy. This shows that our method can be effective in dynamic real-world scenarios.
This research contributes towards comprehensive exploration of anomaly detection in NIDS, offering valuable insights for strengthening security measures in the dynamic landscape of network threats.
The related survey, methodology, experimental setup, findings, and discussions are covered in the following sections. The organization of this article is as follows: A relevant survey on several deep learning approaches for NIDS is included in Section II. Section III presents a thorough overview of techniques built on architectures. Then, in Section IV, we provide our proposed work and research methodologies, and in Section V, we analyze the experiment’s findings in great depth. Our topic is concluded in Section VI.
Related Work
Traditional intrusion detection methods serve as foundational tools for identifying harmful or unauthorized activities in computer systems, networks, and software applications. These approaches have been integral to the cybersecurity landscape for many years, offering valuable insights into potential attacks [6]. However, their limitations become apparent when facing advanced or evolving attack strategies [30], and the presence of false positives can diminish their effectiveness.
Table 1 illustrates the correlation among all features in the network traffic of the CIC-IDS-2017 dataset. demonstrates the evolution of intrusion detection methods designed to combat security risks. Rule-Based Systems rely on pre-established rules to spot patterns linked to malicious attacks. Statistical Methods use techniques like clustering or outlier detection to pinpoint unusual network activity or system behavior. Machine Learning methods, such as decision trees and neural networks, analyze past data to distinguish normal from malicious behavior. Expert Systems blend rules and patterns to make informed intrusion decisions, combining human expertise with computational power. Behavioral Analysis monitors user, process, or system behavior over time to detect intrusions by spotting deviations from expected patterns. Each approach has unique strengths in detecting and responding to security threats, bolstering overall intrusion detection system effectiveness.
To improve accuracy and responsiveness to evolving cyber threats, modern intrusion detection systems often combine traditional methods with advanced techniques like machine learning, deep learning, and artificial intelligence. Deep learning, particularly, has emerged as a game-changer in various fields, including network security. In the following section, we delve into a comprehensive review, exploring the use of deep learning architectures in intrusion detection over the past decade. This section aims to illuminate the evolution of intrusion detection methods and the role of deep learning in overcoming the limitations of traditional approaches.
A. Deep Learning-Centered NIDS Strategies
Network Intrusion Detection Systems (NIDS) have seen an increase in the adoption of deep learning-based methodologies. Deep learning algorithms are used to detect potential intrusions or malicious actions by analyzing network traffic, system behavior, and other data sources [7]. Deep learning-based NIDS methods have gained popularity due to their capacity to automatically learn and adapt to complicated patterns and anomalies in data, allowing them to identify both known and novel types of assaults. It handles complexities, speeds up the process of extracting significant characteristics, and performs better than traditional algorithms [3].
Fig. 2 illustrates the three primary architectures of deep learning models used for Network Intrusion Detection Systems (NIDS). Table 2 illustrates the correlation among all features in the network traffic of the CIC-IDS-2017 dataset. categorizes these deep learning models into three types—generative, discriminative, and hybrid architectures—as detailed in the NIDS literature [35]. The unsupervised operation of the generative architecture encourages the generation of novel data instances. In contrast, the discriminative architecture employs a supervised approach that focuses on class differentiation. Hybrid architectures emerge by ingeniously combining the benefits of both strategies, thereby presenting a holistic and adaptable solution that leverages the symbiotic benefits of generative and discriminative strategies.
1) NIDS Based on Discriminative Models
Sequential models, such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, are pivotal components of network intrusion detection systems (NIDS). These models operate as discriminative classifiers, directly learning the decision boundary between different classes. By capturing the intricate relationship between input features and class labels, they excel at identifying suspicious patterns and abnormalities in network traffic data, crucial for detecting potential intrusions.
RNNs and LSTMs are particularly effective in analyzing the sequential dependencies present in network traffic data over time. Their ability to adapt to evolving attack patterns makes them invaluable assets for NIDS, capable of detecting both known and unforeseen threats [31], [33]. Numerous studies have highlighted the effectiveness of recurrent neural networks, especially when augmented with long short-term memory units and gated recurrent units, in analyzing time series data such as network traffic [1], [2].
Furthermore, the importance of feature selection in identifying malicious traffic in IoT networks cannot be overstated [37]. This underscores the broader significance of research in NIDS [41], [42], emphasizing the need for comprehensive approaches to cybersecurity challenges.
a: Gated Recurrent Unit (GRU)
The Gated Recurrent Unit (GRU) is a type of Recurrent Neural Network (RNN) architecture known for its ability to capture temporal dependencies within sequential data, like time-series or event sequences. In GRU-based models, hidden states evolve during the processing of sequential data, acting as memory cells that store information from previous time steps. GRUs use gating mechanisms to control the flow of information within hidden states, adjusting their states in each iteration based on current input and previous hidden states. This sequential processing allows GRUs to analyze the temporal sequence of data points effectively, making them essential for tasks requiring temporal analysis, such as time-series analysis or natural language processing. Fig. 3 illustrates the activation functions used in GRU, including the reset gate and forget gate, along with one hidden state. These components utilize sigmoid and tanh functions for computation.
2) NIDS Based on Generative Models
In contrast, generative models are focused on understanding the underlying data distribution to generate new samples that resemble the training data. Network intrusion detection systems (NIDS) use generative models, which are unsupervised approaches. These models are trained on raw data that lacks any labels and are created to learn the fundamental patterns and structures of regular network traffic. By utilizing these models, it becomes feasible to identify any suspicious or anomalous activity in the network.
a: Variational Autoencoder (VAE)
Variational autoencoders have distinct features that set them apart from other types of autoencoders, primarily due to their use of a parametrized probability distribution. The encoder generates a hidden representation, which is then reconstructed by the decoder to decode the input data [26]. The output is determined based on the mean and standard deviation of the input. This basic structure of a variational autoencoder, also known as a generative model, is depicted in the Fig. 4
One significant drawback in the existing system was the oversight of class imbalance issues. Recent strides in deep learning, employing generative models like VAE [14] and GAN [12], have addressed this shortcoming and gained popularity. Shone and Purohit [24] novel approach, employing stacked nonsymmetric deep autoencoder with shallow learning, achieved an impressive accuracy level of 85% for both KDD-cup99 and NSL-KDD datasets. The utilization of unsupervised deep learning methods, including variational autoencoding and GAN, effectively handled class imbalances. While there has been recent work analyzing class imbalance using autoencoders or generative models [7], [12], [14], and exploring temporal pattern learning with sequential models [6], [21], there remains a potential for improvement by adapting a recurrent neural network to learn sequence-based attack prediction. Table 3 systematically categorizes NIDS methods that employ a hybridized deep learning architecture.
3) NIDS Based on Hybrid Models
To enhance intrusion detection architecture and effectiveness, network intrusion detection systems (NIDS) utilize hybrid deep learning architectures by combining various deep learning models. These designs typically integrate generative and discriminative models like those mentioned in [7], [12], [14], and [21]. This combination effectively captures spatial and temporal components while addressing class imbalance in network data. For instance, Sequence-to-sequence models process vector sequence inputs represented as
Furthermore, Loganathan [8] utilized Sequence to Sequence models to enhance detection rates and predictions, a concept gaining attention in recent studies. Sequence to Sequence (Seq2Seq) models consist of two RNNs—one for encoding and one for decoding—using RNN cells to extract features from input and predict outcomes. The architecture of the Seq2Seq model is illustrated in Fig. 5. Finally, Fig. 6 provides a systematic classification of three major types of NIDS-based techniques.
Effective anomaly detection relies on understanding and leveraging temporal dependencies in the data. However, many anomaly detection methods overlook crucial temporal metrics, leading to challenges in anomaly identification. Network traffic behavior can change over time due to recurring patterns or predictable intervals, making it essential to detect deviations from expected temporal patterns. By analyzing temporal correlations, anomalies—data points or sequences that diverge from established temporal behavior—can be identified more effectively [47]. Recent advancements highlight the importance of integrating advanced deep learning techniques like recurrent neural networks (RNNs) or long short-term memory (LSTM) networks. These models excel at capturing sequential dependencies, showing promise in enhancing anomaly detection based on temporal patterns. Therefore, a comprehensive approach involves not only understanding temporal dependencies but also leveraging state-of-the-art deep learning methods to strengthen anomaly detection capabilities. The Fig. 6 illustrates a range of deep learning techniques applied in NIDS, including:
Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM)/Gated Recurrent Units (GRU) [1], [2], [4], [5], [7].
Autoencoders (AE), Variational Autoencoders (VAE), Generative Adversarial Networks (GAN) [14], [27], [29].
Conditional Generative Adversarial Networks (C-GAN), Conditional Variational Autoencoders (C-VAE), CNN combined with GRU [20], [22], [28].
Sequence-to-Sequence (Seq2Seq), Adversarial Autoencoders (A-AE), Adversarial Variational Autoencoders (A-VAE), SAVAER [8], [14], [21].
4) Anomaly Based Detection for NIDS
Anomaly-Based Network Intrusion Detection Systems (NIDS) play a vital role in protecting computer networks from evolving cyber threats. They detect deviations from normal network behavior, enabling the identification of unknown attacks, including zero-day attacks. Upon detecting anomalies, NIDS promptly alert network administrators, signaling potential security threats or vulnerabilities, empowering them to take swift action to minimize risks and fortify network assets against potential breaches or attacks [41], [42].
The objective of this section is to present a comprehensive overview of the latest techniques, methodologies, and challenges in the field of anomaly-based NIDS [19]. Various approaches, including statistical, machine learning, and deep learning-based techniques, are analyzed and compared, highlighting their respective strengths and limitations [21]. Furthermore, common datasets, evaluation metrics, and performance measures employed in NIDS research are examined. This survey serves as a valuable resource for researchers, practitioners, and security professionals interested in anomaly-based NIDS, providing insights into current advancements and future directions in this critical domain of network security [43]. Anomaly-based Network Intrusion Detection Systems (NIDS) employ three main techniques [9]:
Statistical-based methods detect anomalies through outlier detection, identifying data points significantly deviating from the expected distribution, and behavior modeling, which analyzes statistical patterns to detect anomalies.
Machine learning-based approaches utilize clustering to group similar data points and identify anomalies as outliers within these clusters. Classification models are trained to classify instances as normal or anomalous based on labeled training data. Ensemble methods combine multiple anomaly detection models to enhance performance.
Deep learning techniques, such as recurrent neural networks (RNNs), capture sequential dependencies in time series data to identify anomalies. Convolutional neural networks (CNNs) extract relevant features from input data and detect anomalies based on patterns within the data. Autoencoders, a type of neural network, perform unsupervised anomaly detection by reconstructing input data and identifying instances with high reconstruction errors as anomalies [30].
The author [45] addresses the critical issue of real-time anomaly detection in the Industrial Internet of Things (IIoT) with notable effectiveness. Given the dynamic, large-scale, diverse, and time-stamped nature of IIoT data, their proposed hybrid end-to-end deep anomaly detection (DAD) system, which employs convolutional neural networks (CNNs) and a two-stage long short-term memory (LSTM)-based autoencoder (AE), is highly relevant. This innovative architecture integrates CNNs and LSTM AEs to enable real-time anomaly detection, crucial for optimizing industrial processes. The model’s validation on multiple datasets and its compatibility with edge devices underscore its practical utility and potential for widespread deployment. Although further optimization and scalability improvements are necessary, the framework provides a robust foundation for future IIoT advancements.
Reference [48] presents a notable advancement in anomaly detection by integrating time-aware machine learning models with sophisticated feature extraction techniques. Their approach achieves an impressive average F1 score of 91% and a peak F1 score of 99% across a 12-year evaluation period, demonstrating the effectiveness of their time-aware models. This research builds upon the real-time anomaly detection framework established by [45], addressing both temporal and feature-based anomalies. By highlighting the critical role of time dynamics in detection systems, [48] emphasizes the evolving importance of incorporating temporal aspects into anomaly detection methodologies.
In a similar vein, [46] advances hybrid network classifiers by integrating bidirectional gated recurrent units with enhanced residual network blocks and employing an autoencoder for dimensionality reduction. The high accuracy rates achieved on the NSL-KDD and UNSW-NB15 datasets demonstrate the technique’s efficacy. Nonetheless, further research into scalability, resource efficiency, real-time capabilities, and broader comparative analyses is required to confirm its robustness. This work makes a significant contribution to enhancing network security in the big data era.
Proposed Methodology
Our proposed method enhances network intrusion detection by addressing two key challenges: adapting to changes over time and managing data patterns. Unlike traditional methods that handle these issues separately, our approach integrates Gated Recurrent Units (GRUs) with the Temporal Correlation Index (TCI) within a Variational Autoencoder (VAE) framework. GRUs are adept at recognizing patterns in sequences of network traffic, helping to detect anomalies based on shifts in behavior. The VAE supports this by learning overall patterns in the data, making it easier to identify unusual activities that could indicate a threat.
To further enhance our model, we incorporate the Temporal Correlation Index (TCI). TCI allows the system to track and remember changes in network behavior over time, enabling more accurate comparisons between past and present data. This addition improves the model’s ability to detect anomalies with greater reliability. By combining GRUs, VAEs, and TCI, our integrated approach offers a more adaptive and accurate solution for network intrusion detection. This method effectively addresses both the evolving nature of network traffic and the need for robust data pattern management, distinguishing it from traditional approaches that often treat these challenges in isolation.
The aim of this research is to improve the detection of network attacks by reducing false alarms and enhancing precision. One common challenge with network traffic datasets is the uneven distribution of classes, which can impact classification accuracy. To address this, we employ an unbiased and unsupervised deep learning method known as Variational Autoencoder (VAE). Our methodology incorporates VAE for sequential learning, enabling the generation of temporal patterns in sequence data with the aid of GRUs.
Our approach leverages two key components: Gated Recurrent Units (GRUs) or Long Short-Term Memory (LSTM) units within the VAE. These components function as specialized neurons, generating probability values based on feature engineering and high-level feature mapping. This allows us to build a deep generative sequence model capable of effectively capturing the temporal behavior of network traffic.
A. Gated Recurrent Unit (GRU) Based Encoder
The Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) that excels in capturing temporal dependencies within sequential data, as discussed earlier. By integrating GRU-based models with temporal sequence thresholds, we can quantify deviations from expected patterns, thereby enhancing anomaly detection. This integrated approach reduces the need for manual feature engineering, ultimately improving the accuracy and efficiency of anomaly detection.
Now, let’s delve into the architecture of the GRU-based VAE, as illustrated in Fig. 7. The figure shows the encoder component, which processes input data through a series of Gated Recurrent Units (GRUs) to produce a latent representation. The encoder outputs two vectors: the mean (
B. Proposed Method Stage I
Our framework combines generative features with the complexities of sequential attack patterns, effectively navigating the nuances of network traffic to combat potential threats. During training, we utilize a curated dataset comprising regular traffic samples, extracted from a larger pool of unlabeled data. The objective is to generate probabilistic reconstructions of time series data that mirror the patterns observed in the training set, predicting the value of each network transaction over time.
To quantify the probability of reconstruction, we employ a probabilistic model that predicts the value of each transaction based on its temporal sequence. Let’s denote the input sequence as
This reconstruction process is probabilistic, meaning that for each time step t, the model produces a probability distribution over possible values of
To train the model, we minimize the reconstruction error between the actual sequence x and the reconstructed sequence \begin{equation*} L = -\sum _{t=1}^{T} \log P(x_{t} | x_{1:t-1}) \tag {1}\end{equation*}
We optimize this loss function using gradient-based optimization techniques such as stochastic gradient descent (SGD) or Adam. By minimizing the reconstruction error, our model learns to generate accurate probabilistic reconstructions of time series data, effectively capturing the patterns observed in the training dataset. This enables our framework to detect anomalies by identifying deviations between the actual network traffic and the reconstructed patterns.
The proposed generative model, as described in the algorithm, is designed to generate new data samples using a latent variable z in conjunction with a Gated Recurrent Unit (GRU) architecture. The system architecture of this method, particularly in the initial phase, is illustrated in Fig. 8. Initially, the model is supplied with input data x, along with parameters such as the learning rate LR and a scale factor k. The algorithm begins by constructing the z layer, where a sampling function combines the mean and log standard deviation of z. This process yields a latent variable z that captures hidden patterns in the data. Subsequently, the algorithm generates new data xnewsamples by utilizing the latent variable z through the GRU unit. Prior to this generation process, the input data undergoes preprocessing, including one-hot encoding for categorical variables and min-max normalization for numerical features, to facilitate effective model training.
The dataset is then divided into training and test sets at a ratio of 3:2, allocating 70% of the data for model training and 30% for testing. During model training, the generative model learns the underlying distribution of the input data without the presence of labels. Furthermore, a Gaussian distribution prior is applied to the latent variable z, ensuring that the generated samples adhere to a plausible distribution. The stage 1 algorithm orchestrates a series of steps, including preprocessing, model training, latent variable sampling, and output generation, to achieve the goal of generating new data samples. By iteratively refining the model parameters and learning the underlying data distribution, the generative model can effectively create synthetic data that exhibits similar characteristics to the original dataset.
In the algorithm for model training, the input includes the original time series data, a Variational Autoencoder (VAE) model for reconstruction (denoted as “vae”), and a threshold value for anomaly detection. The training phase initiates by training the proposed model using the CIC-IDS-2017 dataset of sequential data. Once the model is trained, the VAE is employed to reconstruct the input sequences. Subsequently, the Temporal Correlation Index (TCI) is computed for the training data by comparing the original data with the reconstructed data.
C. Proposed Method Stage II
In our approach to network anomaly traffic analysis, we integrate advanced methodologies to enhance both detection accuracy and robustness. We combine deep learning techniques with artificial immune systems (AIS) for feature detection, leveraging the unique strengths of each method. Deep learning models excel at identifying complex patterns and anomalies in high-dimensional data [34], [38], while AIS provide a biologically inspired mechanism for detecting anomalies and selecting features, emulating the immune system’s ability to recognize and classify unusual patterns [7], [14]. This combination offers a more comprehensive analysis of network traffic, significantly improving the system’s capacity to detect subtle and sophisticated anomalies that might be missed by traditional methods.
Algorithm 1 The Proposed Generative Model
procedure GenerateSamples
Output: generate
(Generated samples x using GRU unit)
end procedure
1) Network Anomaly Detection with Temporal Correlation Index(TCI)
A key component of our methodology is the Temporal Correlation Index (TCI), which enhances our system’s capability to detect subtle temporal changes in network traffic. TCI measures the correlation of network events over time, enabling the identification of subtle shifts and trends in traffic patterns [41]. By incorporating TCI alongside deep learning and AIS, our system improves both anomaly detection accuracy and overall performance, addressing both temporal and feature-based anomalies effectively.
The integration of TCI marks a significant advancement in network traffic analysis. This novel metric not only improves the detection of subtle anomalies but also enhances the system’s adaptability to evolving threats, setting a new benchmark in intrusion detection systems.\begin{equation*} TCI_{x,y} = \frac {\sigma _{x} i - \mu _{x} \cdot \sigma _{y} i - \mu _{y}}{\sigma _{x} \cdot \sigma _{y} \cdot n} \tag {2}\end{equation*}
Algorithm 2 Algorithm for the Proposed Generative Model
Input:
Output: generate
procedure Preprocessing
Apply one-hot encoding and min-max normalization method to the input data x
Train-Test Split
Split the preprocessed data into a training set and a test set in a ratio of 3:2, respectively
Model Training
Train the model using the training set x without labels
Apply Gaussian Distribution
Apply a Gaussian distribution prior to
Generate Output
Generate the output
end procedure
This equation represents the calculation of the Temporal Correlation Index (TCI) between two variables, x and y. The TCI measures the correlation between the temporal patterns of these variables over a given period. Here’s a breakdown of the components:
: This denotes the Temporal Correlation Index between variables x and y.\text {TCI}_{x,y} : Represents the standard deviation of variable x, indicating the variability or dispersion of data points around the mean.\sigma _{x} : Denotes the mean value of variable x, indicating the central tendency or average value of the data points.\mu _{x} i: Refers to the index corresponding to each data point.
n: Represents the total number of data points or observations in the time series.
Algorithm 3 Model Training
procedure ModelTraining(original_data, vae, threshold)
Input:
–original_data: The original time series data
–vae: Variational Autoencoder model for reconstruction
–threshold: Threshold value for anomaly detection
Output:
–anomalies: List of indices where anomalies are detected
Step 1: Training Phase
Train the proposed model using CIC-IDS-2017 dataset of sequential data.
Reconstruction: After training, vae is used to reconstruct the input sequences.
Step 2: Calculate TCI for Training Data
After training the vae, calculate the TCI for the training data by comparing the original data with the reconstructed data.
This step measures the temporal correlation between the two:
tci_training = calculate_tci(train_data, reconstructed_data)
where
Step 3: Set a Threshold for Anomaly Detection
Define a threshold value for TCI above which data points are considered normal, and below which data points are considered anomalous.
end procedure
Similarly,
2) Temporal Metric Driven GRU Embedded Generative Neural Network (TMG-GRU-VAE)
Our proposed method integrates a GRU-based Variational Autoencoder (VAE) with the Temporal Correlation Index (TCI) and utilizes a meticulously calibrated threshold for anomaly detection. This novel combination represents a significant advancement, offering a robust solution to the complexities of network traffic data. In our experimental approach, we define an anomaly detection threshold within this integrated framework, finely tuned alongside the TCI to achieve an optimal balance between sensitivity and specificity. This calibration aims to enhance detection accuracy while minimizing false positives. Details of this approach and its implementation are presented in Fig. 9. Additionally, Fig. 10 provides a comprehensive overview of the final architecture of the proposed methodology, illustrating the complete workflow of the system.
Analyzing temporal relationships between events is crucial in network traffic analysis. Normal network behavior tends to follow predictable patterns over time, such as consistent user logins and data transfers during work hours. Methods that capture these temporal dependencies are better equipped to differentiate between normal and potentially malicious activities. TCI enhances our methodology by allowing us to delve into the temporal dynamics of network traffic data. This strengthens our anomaly detection capabilities, making them more adaptable to the constantly changing nature of network environments.
Experimental Design and Evaluation Metrics
In the experimental analysis phase, we put the above discussed concepts into practice by evaluating the effectiveness of our proposed methodology. Through rigorous experimentation and validation, we aim to assess the performance of our model in accurately detecting anomalies within network traffic data. This analysis will provide valuable insights into the real-world applicability and efficacy of our approach, paving the way for advancements in intrusion detection systems.
A. Experimental Setup
To assess the effectiveness of our proposed Temporal Metric-Driven GRU Embedded Generative Neural Network (TMG-GRU-VAE) in anomaly detection, we conducted a rigorous evaluation process. Table 4 presents the details of the computational resources and requirements used for deploying our proposed method, TMG-GRU-VAE.
1) Dataset Selection and Preparation
We employed the widely used CICIDS-2017 and CICIDS-2018 datasets, known for their comprehensive coverage of network intrusion scenarios and realistic traffic patterns. To ensure data quality and suitability for training, we meticulously preprocessed the datasets. This involved:
Removing inconsistencies
Addressing missing values
Selecting relevant features
2) Model and Hyperparameter Tuning
Our model leverages the TMG-GRU-VAE architecture, which seamlessly integrates Variational Autoencoders (VAEs) with Gated Recurrent Units (GRUs). This allows the model to capture the temporal dependencies inherent in network traffic data. To optimize model performance, we employed a hyperparameter tuning process. This involved systematically adjusting parameters like:
Learning rate
Batch size
Optimizer selection
3) Implementation and Model Training
We built the model using the TensorFlow library, leveraging its extensive deep learning functionalities and computational power. During the training phase, we rigorously trained the model over multiple epochs, continuously adjusting parameters to minimize loss and maximize accuracy. Following training, we meticulously validated the developed model using the reserved testing data. Table 4 demonstrates the experimental setup in detail.
B. Description of the Dataset
The CIC-IDS-2017 and CIC-IDS-2018 datasets [40] are most recently used as benchmarks in the disciplines of network intrusion detection. The tagged network traffic data that intrusion detection systems contain allows researchers to evaluate their effectiveness. The dataset provides a realistic representation of the kind of assaults that real networks face since it includes genuine network traffic that was collected from a major enterprise network. Additionally, it covers a wide range of attacks, allowing researchers to assess how well intrusion detection systems protect against different threats as illustrated in Table 5
1) Temporal Features in Network Traffic
The dataset contains various temporal features that can be used to analyze network traffic patterns over time. Fig. 11 illustrates the correlation among all features in the network traffic of the CIC-IDS-2017 dataset. These features include:
Timestamp: The timestamp indicates when the network packet was captured.
Flow duration: The time that a connection or flow between two IP addresses lasts.
Start time: The instant the first flow packet was collected.
End time: The moment the last flow packet was successfully collected.
Active time: Between the start and termination times of the flow is the active time.
Idle time: The interval of time between two successive flows.
Source/destination time-to-live (TTL): The IP packets’ source/destination time-to-live (TTL) value is determined by the source/destination IP addresses.
Protocol: The network traffic protocol is known as a protocol (e.g., TCP, UDP).
Time-based traffic features: Features of time-based traffic include the volume of packets or bytes transmitted and received over a given period of time (e.g., 10 seconds, 1 minute).
The Table 5 provides comprehensive information on the datasets and the attack classes contained inside each.
C. Evaluation Metrics
To illustrate the effectiveness of our approach, experiments were conducted on the CICIDS-2017 and CICIDS-2018 benchmark datasets, aligning with current research preferences [25], [26]. The structure of our research questions is as follows:
How effective is the novel dynamic temporal modeling approach, which utilizes emporal Metric-Driven GRU Embedded Generative Neural Network (TMG-GRU-VAE, in identifying and adapting to temporal changes network attack patterns?
Did the proposed methodology with Temporal Correlation Index (TCI) thresholds decrease the false positives?
How does the proposed methodology perform when evaluated using the latest network traffic datasets?
A classification algorithm in network intrusion detection is to achieve the maximum number of accurate detections with the least false positives. Quantitative measures of malignant behaviors are used to identify them in a system. Based on the average of these counts, the effectiveness of the model is determined. According to the formula below, evaluation metrics for Network Intrusion Detection Systems (NIDS) are measured:
1) Recall
The recall is the number of true positive predictive values divided by the total number of true positives and false negatives. It is also referred to as the detection rate.\begin{equation*} \text {Recall} = \frac {TP}{TP + FN} \tag {3}\end{equation*}
2) Precision
Using precision, we can calculate the number of true positives in relation to the total number of positive predictions.\begin{equation*} \text {Precision} = \frac {TP}{TP + FP} \tag {4}\end{equation*}
\begin{equation*} \text {F1 score} = 2 \cdot \frac {\text {Recall} \times \text {Precision}}{\text {Recall} + \text {Precision}} \tag {5}\end{equation*}
3) Accuracy
As a proportion of all predictions, accuracy measures how accurate the predictions are. An accuracy score is also known as detection accuracy.\begin{equation*} \text {Accuracy} = \frac {TP + TN}{TP + TN + FP + FN + TN} \tag {6}\end{equation*}
4) Confusion Matrix
The confusion matrix is instrumental in evaluating classification algorithms. It is used to derive several key metrics:
True Positive (TP): The count of instances correctly classified as positive.
False Positive (FP): The count of instances incorrectly classified as positive.
True Negative (TN): The count of instances correctly classified as negative.
False Negative (FN): The count of instances incorrectly classified as negative.
5) False Alarm Rate
The False Alarm Rate (FAR) can be calculated using the following formula:\begin{equation*} \text {FAR} = \frac {\text {FP}}{\text {FP} + \text {TN}} \tag {7}\end{equation*}
6) Mcnemar’s Test
McNemar’s test is a statistical method used to evaluate the performance difference between two algorithms based on their confusion matrix results. It is especially useful for assessing whether one algorithm significantly outperforms another [49].
The test statistic \begin{equation*} z = \frac {|\text {FP}_{1} - \text {FP}_{2}|}{\sqrt {\text {FP}_{1} + \text {FP}_{2}}} \tag {8}\end{equation*}
Here, FP1 and FP2 represent the number of discordant pairs where one algorithm misclassifies instances while the other does not.
A
7) Loss Function
The equation (7) represents the loss function used in training a probabilistic model, particularly in the context of sequence generation or prediction tasks.\begin{equation*} L = -\sum _{t=1}^{T} \log P(x_{t} | x_{1:t-1}) \tag {9}\end{equation*}
L represents the loss function.
denotes the summation over all time steps t, where T is the total number of time steps in the sequence.\sum _{t=1}^{T} calculates the negative log-likelihood of each data point\log P(x_{t} | x_{1:t-1}) given the previous time stepsx_{t} . This term measures how well the model predicts the current data point based on the past observations.x_{1:t-1} The negative sign - is used to convert the likelihood into a loss function that needs to be minimized during training.
8) Temporal Correlation Index (TCI)
Train the GRU-based VAE: Use sequential data like network traffic logs to train the VAE. Ensure it encodes input sequences into a latent space and decodes them back accurately. The latent space captures regular data patterns.
Reconstruction: Post-training, employ the VAE to reconstruct input sequences, generating corresponding reconstructed sequences.
TCI Calculation: To calculate the Temporal Correlation Index (TCI) between the original sequence (X) and the reconstructed sequence (Y) from the VAE, you can use the formula mentioned in Equation (12):
where\begin{equation*} TCI_{x,y} = \frac {\sigma _{x} i - \mu _{x} \cdot \sigma _{y} i - \mu _{y}}{\sigma _{x} \cdot \sigma _{y} \cdot n} \tag {10}\end{equation*} View Source\begin{equation*} TCI_{x,y} = \frac {\sigma _{x} i - \mu _{x} \cdot \sigma _{y} i - \mu _{y}}{\sigma _{x} \cdot \sigma _{y} \cdot n} \tag {10}\end{equation*}
andx_{i} are the data points at time step i in the original and reconstructed sequences, respectively.y_{i} and\mu _{x} are the means (average values) of the original and reconstructed sequences, respectively.\mu _{y} and\sigma _{x} are the standard deviations of the original and reconstructed sequences, respectively. n is the total number of data points in the sequences (the length of the sequences).\sigma _{y}
Here
Anomaly Detection Threshold: After calculating TCI for each sequence, you can set a threshold for TCI values. Sequences with TCI values below this threshold may be considered anomalies. The choice of the threshold depends on the characteristics of your data and the desired level of sensitivity in anomaly detection.
Real-time Anomaly Detection: In a real-time scenario, you can continuously feed incoming sequences to the trained VAE, calculate the TCI for each new reconstructed sequence, and compare it to the threshold. If the TCI falls below the threshold, it can trigger an anomaly alert.
Results Discussion
The comparison in Table 6 and 7 assesses how well hybrid models capture features of both normal and abnormal traffic. We tested different RNN setups to see how effective the models are and compared with the proposed methods. The table summarizes our findings, including overall accuracy, recall, F1 score, and precision on the CIC-IDS-2017 dataset. Our focus on Network Intrusion Detection Systems (NIDS) highlights their superior performance, especially in recall and F1 score. On average, GRU or LSTM-based VAE outperforms, with GRU-VAE achieving the highest precision at 88.1%. It also boasts the best F1 score, reaching 80-82% for successful attack detection.
As shown in Table 8, we evaluated various models—LSTM-VAE, GRU-VAE, LSTM-VAE with TCI, and GRU-VAE with TCI—using the CIC-IDS-2017 and CIC-IDS-2018 datasets for both Class 0 and Class 1. Overall, the models perform well for Class 0, demonstrating high precision, recall, F1-score, and accuracy. For Class 1, the values are slightly lower but still reasonable, aligning with our objective. Across both datasets, the models consistently outperform Class 0 compared to Class 1. They exhibit strong precision and recall, resulting in good F1-scores and accuracy. This consistency across different methods and datasets is valuable for evaluating intrusion detection performance. The proposed TCI-based models exhibit increased sensitivity to subtle temporal anomalies, enabling them to identify Class 1 instances that might be overlooked or misclassified by traditional models. In general, precision, recall, F1-score, and accuracy for Class 0 are high, while metrics for Class 1 are slightly lower but still reasonable. Notably, TCI-based methods show improvement.
Fig. 12 effectively visualizes and compares the performance of various anomaly detection methods, including LSTM-VAE and GRU-VAE, both with and without TCI, across multiple datasets. These charts facilitate straightforward comparisons by presenting metrics side-by-side. They highlight the improvements brought about by TCI and reveal performance variations across different classes and datasets. The clear presentation of results through grouped bar charts enhances readability and enables quick interpretation of key findings. The comparative analysis of anomaly-based intrusion detection models on the CIC-IDS-2017 and CIC-IDS-2018 datasets reveals notable differences in False Positive Rates (FPR). The LSTM-VAE with TCI demonstrates consistent FPR values across both datasets, indicating robustness in handling the TCI. Conversely, the GRU-VAE model shows increased FPR values, especially in Class 1, suggesting a higher sensitivity to dataset variations. These observations underscore the complex interaction between model architecture, dataset characteristics, and additional features such as TCI and threshold parameters. This analysis provides valuable insights for optimizing intrusion detection models across diverse datasets and conditions, suggesting the need for further investigation and fine-tuning to enhance model adaptability and robustness.
Comparative analysis of proposed method with and without TCI using the CIC-IDS-2017 and CIC-IDS-2018 datasets.
Fig. 13 and 14 illustrate the substantial reduction in False Positives (FP) achieved by the proposed methods across all models. For the CIC-IDS-2017 dataset, the False Positive rates range from 7.2% for GRU-VAE with TCI to 12.9% for LSTM-VAE without TCI. In the CIC-IDS-2018 dataset, these rates range from 7.1% for GRU-VAE with TCI to 14.1% for GRU-VAE without TCI.
The data highlights significant improvements, particularly with the LSTM-VAE with TCI and GRU-VAE with TCI models, compared to their counterparts without TCI. These models show substantial enhancements in reducing false positives, emphasizing the effectiveness of the TCI approach in refining anomaly detection precision. The TCI-based approach consistently outperforms both standard and advanced models, demonstrating its potential for significantly improving precision in anomaly detection systems.
Table 9 presents the confusion matrix results for the CIC-IDS-2017 dataset. The LSTM-VAE model achieves a True Positive (TP) rate of 67.5%, but also has a high False Positive (FP) rate of 12.9% and a False Negative (FN) rate of 12.8%. The True Negative (TN) rate is relatively low at 6.7%, indicating limitations in correctly identifying non-intrusions. In contrast, the LSTM-VAE with TCI model shows notable improvements, with a TP rate of 71.8%, a reduced FP rate of 8.0%, and a reduced FN rate of 8.0%. The TN rate increases to 12.1%, suggesting enhanced performance in both intrusion detection and minimizing false alarms.
The GRU-VAE model demonstrates a TP rate of 68.4%, with a moderate FP rate of 10.3% and an FN rate of 10.2%. Its TN rate of 11.4% is higher than that of LSTM-VAE, indicating better performance in identifying non-intrusions. The GRU-VAE with TCI model achieves the highest performance, with a TP rate of 73.1%, the lowest FP rate of 7.2%, and an FN rate of 7.1%. The TN rate also improves to 12.5%, demonstrating superior effectiveness in both detecting intrusions and reducing false positives.
Table 10 shows the confusion matrix results for the CIC-IDS-2018 dataset. The LSTM-VAE model achieves a TP rate of 66.3%, with an FP rate of 12.6% and a very low FN rate of 0.3%. The TN rate is 20.7%, reflecting good performance in correctly identifying non-intrusions. The LSTM-VAE with TCI model shows improved performance with a TP rate of 68.6%, a reduced FP rate of 8.5%, and a slightly higher FN rate of 0.7%. The TN rate increases to 22.2%, indicating enhanced performance in both intrusion detection and non-intrusion classification.
The GRU-VAE model has a TP rate of 64.9%, with a higher FP rate of 14.1% and an FN rate of 0.5%. The TN rate of 20.5% is similar to that of LSTM-VAE, indicating comparable performance in detecting non-intrusions. The GRU-VAE with TCI model delivers the best performance, with a TP rate of 69.4%, the lowest FP rate of 7.1%, and an FN rate of 0.9%. The TN rate is the highest among all models at 22.6%, demonstrating the best balance between detecting intrusions and minimizing false alarms. Overall, incorporating TCI generally improves model performance by increasing TP rates and reducing FP rates. The GRU-VAE with TCI model consistently exhibits superior performance across both datasets, achieving the highest TP rates and the lowest FP rates.
Statistical tests across both datasets, as presented in Tables 11 and 12, show that the proposed model, GRU-VAE with TCI, consistently achieves the highest Z-scores, demonstrating its superior performance relative to other methods. In contrast, LSTM-VAE generally exhibits the lowest Z-scores, indicating less effectiveness compared to alternative approaches, particularly in the absence of TCI. The inclusion of TCI significantly enhances the performance of both LSTM-VAE and GRU-VAE, resulting in higher Z-scores. This performance boost is especially notable in the CIC-IDS-2018 dataset, suggesting that TCI has a more pronounced impact on models when applied to newer or differently distributed data.
The comparison between datasets reveals that overall model performance is superior on the CIC-IDS-2018 dataset. This may be attributed to the dataset’s distinct characteristics or potentially lower complexity compared to the CIC-IDS-2017 dataset. The observed improvements with TCI are more significant in the CIC-IDS-2018 dataset, indicating that TCI may have a more substantial effect on model performance in varied data distributions or experimental settings.
As cybersecurity faces new challenges with the expansion of the Internet of Things (IoT) and the evolution of complex digital ecosystems such as smart cities [44], it is crucial to adopt a multifaceted strategy that includes robust network security measures, secure device design, and continuous threat detection.
Conclusion
In today’s ever more interconnected digital environment, it’s essential to prioritize strong network security in response to the continually evolving cyber threats. Identifying irregularities in network traffic is vital for upholding security. However, the constantly changing nature of this data presents a considerable obstacle. While traditional generative neural networks can be effective in certain intrusion detection scenarios, they frequently encounter difficulties in understanding the time-based patterns present in network traffic, leading to challenges in differentiating between normal and potentially harmful behaviors.
To address this challenge, we propose a new approach called Temporal Metric-Driven GRU Embedded Generative Neural Network (TMG-GRU-VAE). This method integrates Gated Recurrent Units (GRUs) into a variational autoencoder (VAE) architecture, enhancing the model’s ability to learn temporal characteristics from network traffic data. Additionally, we introduce the Temporal Correlation Index (TCI) score, a metric tailored for anomaly detection in Network Intrusion Detection Systems (NIDS). TCI dynamically assesses temporal behavior within network traffic, effectively reducing false alarms and enhancing system reliability.
Our proposed method demonstrates significant improvements over traditional approaches, with anomaly detection accuracy enhanced and false alarms reduced by
This research provides valuable insights into anomaly detection within Network Intrusion Detection Systems (NIDS). However, future work should focus on developing explainable AI models to better understand why specific traffic is flagged as anomalous. Enhancing interpretability and confidence in detection results is crucial, especially in addressing the challenges posed by emerging and zero-day attacks.
ACKNOWLEDGMENT
The authors wish to express their sincere gratitude to Prof. Gang Li for providing valuable insights and constructive feedback that improved the quality of the article. They also extend their special thanks to the anonymous reviewers for their thorough evaluation and insightful suggestions, which significantly enhanced the overall quality of this article.