Introduction
Recently, the trend of integrating Software-Defined Networking (SDN) [1] with Network Functions Virtualization (NFV) [2] (also known as the software-defined NFV architecture) to accomplish various network control and management goals has seen substantial growth. Through NFV, SDN can dynamically create a virtual service environment for a service chain. Consequently, the dedicated hardware and complicated labor required for providing a new service request are avoided. Along with the use of SDN, NFV further enables real-time and dynamic function provisioning, along with flexible traffic forwarding. Moreover, service function chaining (SFC) technology was born to chain multiple services (NIDS, firewall, load balancing, etc.) as a network flow [3]. Figure 1 illustrates a typical SDN-based cloud environment prototype including three technologies, SDN, NFV, and cloud platform.
A. Problem Statements
Switches in the data plane have no intelligence; they simply send raw data packets to the controller. Unfortunately, this behavior introduces a serious vulnerability that can be exploited by an attacker. A malicious or faulty SDN switch could severely disrupt network communications. In particular, colluding malicious switches that try to conceal their misbehavior are more challenging to detect. The anomalous behaviors at an SDN switch can be divided into five categories:
Traffic Loss: a compromised switch can drop traffic randomly or selectively.
Traffic Misroute: the traffic of a compromised switch can be forwarded to a wrong address.
Traffic Modification: The traffic contents might be modified by a compromised switch.
Traffic Reorder: a compromised switch may change the order of packets while still being valid in terms of content, delay, and routes. Continuous reordering of TCP packets can significantly deteriorate the TCP throughput [4].
Traffic Delay: a compromised switch may delay traffic in order to increase jitter, which raises full of problems with time-sensitive traffic. Moreover, delaying traffic for a TCP stream triggers spurious timeouts and unjustified retransmissions, resulting in grave damage to the TCP throughput [13].
B. Our Proposal
To resolve the serious issues given above, in this paper, we present a concrete proposal with a novel mechanism that monitors, checks and detects anomalous behaviors of SDN switch traffic. The proposed mechanism applies a technique that is a multivariate time series anomaly detection through a stochastic recurrent neural network variant, and then uses a dynamic threshold selection to automatically set the optimal threshold to differentiate between normal and abnormal SDN switch traffic.
C. Contribution
Our major contributions can be listed as follows:
First, we propose a new approach, which is a multivariate time-series technique to keep track of and detect anomalous behaviors of SDN switch traffic quickly.
Second, we propose a complete scheme, Enhanced Compromised Switch Detection (ECSD), which applied the multivariate time-series technique to detect compromised switch attacks effectively.
Next, we investigate the vulnerabilities of the SDN switches on the cloud and the present common types of attacks in a cloud-based SDN environment in a distributed manner.
Our experiments were conducted in a real cloud computing environment using OpenStack [42] integrated with the SDN environment using the Open Network Operating System (ONOS) controller [17], but they could be smoothly applied to other controllers.
Finally, we conducted experimental studies and compared our detection scheme with other existing proposals and some machine-learning-based algorithms that apply the multivariate time-series technique. An extensive comparison of ECSD demonstrates the improved efficiency of our proposed scheme.
The rest of this article is constructed as follows. Section II shows some research related to our work. Section III presents background knowledge about the SDN-based cloud, compromised switches on the cloud, and basics of GRU, VAE, SGVB, and Planar NF. Section IV gives our research rationale, system analyses, and practical design of the proposed security mechanism. Section V focuses on our experimental setup, and the results are analyzed using several evaluation metrics in Section VI. Lastly, Section VII concludes our work and lays out some potential future developments.
Related Work
In general, the improvement of SDN security applications and controllers, and the online verification of network restrictions, separately, have been the prime focus of SDN security literature [6], [13]. Intrusion detection systems (IDSs) are important as prime defense techniques to protect network systems. Thus, Teng et al. [9] proposed an intelligent model based on two famous machine-learning algorithms (decision trees (DT) and support vector machine (SVM)), and they did a test on their model on a KDD CUP 1999 dataset. The results demonstrated an accuracy reaching 89%. However, SVMs normally show high computation cost and poor performance. Another IDS work is known as improved SD-WSN framework [10] based on a SDN scheme. This framework addresses the network management, node failure issues and provides a means for flexible data forwarding. However, not many studies of these issues provide effective protection against compromised forwarding devices in the data plane [5], [8]. Attacks from the data plane have posed serious threats to SDN [7]. Using multiple hosts under OpenFlow switches, attackers can disrupt the control plane or learn its behaviors without knowing much information about controller applications. These attacks include DoS, topology poisoning, and side-channel attacks [15]. Some faulty behaviors that originate at SDN switches are traffic loss, traffic fabrication, traffic misrouting, traffic modification, traffic delay, and traffic reordering [13]. Among existing solutions, a proposal called SPHINX [25] has practical implementations [5]. It detects and mitigates security attacks initiated by malicious switches by abstracting the network operations with incremental flow graphs. It also detects attacks as per the policies defined by the administrator and responds accordingly. However, SPHINX still has its own limitations. First, SPHINX cannot detect when a malicious forwarding device is delaying packets [26]. Second, it triggers significant communication overheads, because it gathers statistics of all flows from all switches [27]. A fairly effective scheme called WedgeTail [26], which is the closest work to SPHINX, is known as an effective intrusion prevention system for the data plane of SDNs. This work does not depend on rules pre-defined by an administrator, and it is capable of detecting and responding to all embedded malicious forwarding devices within a reasonable time-frame. However, to make WedgeTail work, the authors of WedgeTail made some assumptions, such as that the control plane itself and the defined policies are trustworthy, but in practice, the control plane is highly vulnerable, which attackers can manipulate to deploy a compromised switch attack. Another work is known as FlowMon [24], in this work, two anomaly detection algorithms are proposed, Packet Droppers and Packet Swappers. To detect malicious switches, the controller analyzes the collected port statistics and the actual forwarding paths. However, FlowMon might be dysfunctional if the dishonest switches provide false information in their statistical reports. Lastly, an online detection mechanism is proposed to find suspicious SDN switches and generate security alerts using security information and event management (SIEM) technology [14]. The technology can provide real-time analysis of security alerts for network managers. However, SIEM only covers some abnormal switches behaviors such as incorrect forwarding, packet manipulating and weight adjusting.
Recently, we proposed a new approach [16] to detect compromised switches using an autoregressive integrated moving average (ARIMA) learning model to predict the numbers of flows and calculating the maximum Lyapunov exponent to analyze the chaotic behaviors of prediction error time-series. If the Lyapunov exponent remains positive for
Understanding these difficulties, in this paper, we propose a mechanism to detect and defend SDN compromised cloud-based switches effectively using dynamic thresholds, which brings about high detection rates, low false-alarm rates, and efficiency in resource consumption. Our detection mechanism does not deteriorate network performance. Moreover, switches do not necessarily isolate or turn off for detection. Instead, the administrator can easily launch the detect process when the network is running. Therefore, our detection mechanism is very practical.
Background Knowledge
A. SDN-Based Cloud
SDN-enabled cloud computing has been emerging as a future SDN-based cloud environment. Many studies have proposed a various methods to not only increase revenue for data center providers but also reduce the round-trip time of tasks for applications, for example, [11], [12]. The integration of NFV and SDN technologies is known as the SDN/NFV architecture [2], as illustrated in Figure 2. It consists of NFV orchestration, a controller platform, forwarding devices, and servers. The SDN controller is responsible for controlling the traffic path using primarily the OpenFlow protocol to communicate with forwarding devices (OpenFlow switch) to impose policies from the control plane to the data plane. Meanwhile, the NFV uses standard computing virtualization technology to consolidate in commodity hardware (i.e., servers or cloud platform (e.g. OpenStack)) to deliver network functions of high bandwidth and high performance with low cost. Hypervisors, which run on servers, primarily focus on supporting VMs that enable them to operate network functions, such as firewalls, proxies, and IDS.
SDN switch statistics visualized as a multivariate time-series snippet, with two anomalies
Detailed architecture modules and synchronized OpenStack controller located in the SDN application layer.
The logical control module is composed of the SDN controller [18] and the NFV orchestration system. The NFV orchestration system is responsible for provisioning virtualized network functions and is controlled through standard interfaces or APIs by the SDN controller. After determining the policy requirements and creating network topology, the controller generates optimal function assignments and assigns the functions to certain VMs in the optimized path by translating the logic policy specifications, which is known as a service chain. The NFV orchestration system conducts a service function chain, and the controller instructs the traffic through the required and appropriate sequence of VMs by installing flow rules into forwarding devices.
B. Compromised Switches on the Cloud
Compromised switches are a novel and thorny problem for SDNs. In the two studies [13] and [14], authors formally defined various types of compromised switches attacks, e.g., packet dropping, packet duplicating, packet manipulating, incorrect forwarding, traffic delaying, and traffic reordering, etc. These attacks are stealthy, and they can occur while the compromised switch performs packet forwarding. By controlling the compromised switches to execute one of these attacks, attackers can bring severe problems to the entire network. Moreover, these kinds of attacks can severely deteriorate the TCP throughput [4].
Figure 1 is a typical example to illustrate the harmfulness of a compromised switch. The compromised OpenFlow switch 3 can drop packets or falsely forward them to its switch neighbors multiple times. It also can delay packets for a while. All these kinds of actions may ruin the operation of the entire network.
C. Basics of a Variant of RNN - GRU, VAE, SGVB, and Planar NF
Recurrent neural network (RNN) [22], a well-known method in deep learning, feeds the output from the previous step as the input to the current step, which enables RNN to represent the time dependence. However, RNN fails to learn the long-term dependence for multivariate time-series data due to the complex temporal dependence and stochasticity.
Fortunately, gated recurrent unit (GRU), a variant form of RNN, can learn long-term dependence even without using a memory unit to control the flow of information, because it just exposes the full hidden content without any control. Moreover, GRU does not require a large dataset because of its fewer parameters and simpler structure [19]; therefore, GRU is applied in this study. Variational autoencoder (VAE) is another deep-learning technique for learning latent representations [20], and it has been successfully applied to anomaly detection for seasonal univariate time series [35]. VAE represents a high-dimensional input
Stochastic gradient variational bayes (SGVB) [20] is a variational inference algorithm commonly applied in VAE to tune the parameters \begin{align*}&\hspace {-1pc}\mathcal {L}(x_{t}) \\=&{\mathbb {E}}_{q_{\phi }(z_{t}|x_{t})}{[log(p_{\theta }(x_{t}|z_{t}))] } - D_{KL}[q_{\phi }(z_{t}|x_{t})||p_{\phi }(z_{t})] \\=&{\mathbb {E}}_{q_{\phi }(z_{t}|x_{t})}{[log(p_{\theta }(x_{t}|z_{t})) + log(p_{\theta }(z_{t})) - log(q_{\phi }(z_{t}|x_{t}))] } \\\tag{1}\end{align*}
Planar normalizing flows (Planar NF) [21] is a transformation technique that transforms
The steps of this combination are shown as follows and in Figure 6. First, GRU is used to capture complex temporal dependencies between multivariate observations in
Overall model architecture is composed of a
The overall model architecture is presented in Figure 6, which is composed of a
ECSD: Enhanced Compromised Switch Detection
In this section, we first describe a rationale for using a multivariate time-series detection technique for our scheme and system design analysis. Second, a workflow system is described, and then the system process logic is presented. Lastly, the internal modules of the proposed scheme will be provided in detail. Table 1 summarizes parameters, variables, and constants used in this paper.
A. Rationale of Using a Time-Series Technique and System Design Analysis
In the SDN-based cloud, SDN switches are typically monitored with multivariate time series, whose anomaly detection is critical for service quality management. For example, as shown in Figure 3, networking metrics, such as type of packet (PktType), the total number of received packets (RxNum), the total number of transmitted packets (TxNum) and the total number of dropped packets (DrNum), are collected and supervised by a network administrator. When a compromised switch attack is launched, one of the selected features will suddenly change. Thus, we need an intelligent model to discover any changes in the values of multiple time series simultaneously to detect the attack. Therefore, in this paper, a technique called stochastic recurrent neural network [28] is adopted for multivariate time series anomaly detection. The key idea is to capture the normal patterns of multivariate time series by learning strong representations and to use the reconstruction probabilities to determine anomalies using dynamic threshold selection.
A time series involves successive observations which are normally collected at equally spaced timestamps [29]. In this work, we concentrate on multivariate time series, defined as
Figure 4 depicts an overview of our proposed framework aimed at detecting a compromised switch in SDN architecture synchronized with an OpenStack [42] controller, and the proposed framework is an extension in the Control Plane. This framework consists of extension components that can be implemented and distributed in both the cloud controller platform (OpenStack) and SDN application plane. We suggest that these components should be set at a dedicated security server in actual deployment in order to deplete the SDN controller processing load. However, in this work we design modules including model training, online detection, and dynamic threshold selection, and locate these modules in the SDN application layer for the convenience, only some of the components are located in the OpenStack controller for synchronization and further actions.
B. Workflow System
In this work, our mechanism is situated in the SDN controller and consists of five separate main modules: raw data processing (flow collector, feature extractor, and feature transformation and normalization), model training, online detection, dynamic threshold selection, and mitigation agent, as shown in Figure 5.
Raw data preprocessing is a module shared by both model training and online detection. During data preprocessing, the traffic collector reliably collects data on the use of the physical and virtual resources comprising deployed clouds and acquires flow information from OpenFlow switches at a predefined time interval. Next, this data will then be sent to the feature extractor to extract data’s attributes. Subsequently, the data is transformed by data standardization, and then it is segmented into sequences through sliding windows [30] of length
The online detection module stores the trained model. A new observation (e.g.,
Note that, in our scheme, the training database is continually updated by the attributes of data collected from the above loop. In a preset time, which is being defined by a network administrator, both the model training and online detection will be trained using the updated database. By doing so, the proposed mechanism can adapt well to the various network systems.
C. System Process Logic
First, a set of observations are assigned to
If
, then the new traffic is labeled as an attack source, therefore,S_{2t} < threshold is assigned to -1, then, it will be forwarded to the mitigation agent to handle later.Flag_{t} If
, then the new traffic is labeled as a normal source,S_{2t} >= threshold is assigned to 1, and then, the algorithm continues its loop.Flag_{t}
Algorithm 1 ECSD: Proposed Compromised Switch Defense Scheme
{
loop
while true:
if
Feed
Feed
Feed
Feed
if
Forward
else
continue
end if
end if
end while
end loop
D. Internal Modules
Let us have a closer look into the internal modules of our proposed scheme.
1) Traffic Collector
First, to get data from the data plane, we run a statistics requester that sends periodic request statistics [32] to an OpenFlow switch and waits for the response statistics (Figure 5). After obtaining the response statistics, the OpenFlow channel sends them to the raw data processing module. Next, a flow collector module simply runs on the SDN controller and collects data from both the OpenFlow switch statistics and the StatsResponse messages [32] for a preset period.
2) Feature Extractor
Based on our empirical research and a technique in feature selection called ANOVA (analysis of variance) [33], this module extracts data information from the traffic collector to take out several attributes, as shown in Table 2. These key features are selected from \begin{align*} F=&\frac {\overline {MST}}{\overline {MSE}}\\ \overline {MST}=&\frac {\Sigma ^{k}_{i=1}(T^{2}_{i})/n_{i}) - G^{2}/n}{k-1} \\ \overline {MSE}=&\frac {\Sigma ^{k}_{i=1}\Sigma ^{n_{i}}_{j=1}Y^{2}_{ij} - \Sigma ^{k}_{i=1}(T^{2}_{i}/n_{i}))}{k}\end{align*}
3) Feature Transformation & Normalization
At this stage, all the categorical features, which were extracted in the previous stage, are transformed from categorical data into numeric data using the one-hot encoding technique [34]. After the transformation, the data needs to be normalized, we use max-min normalization and its formula can be expressed as:\begin{equation*} X_{normalized} = \frac {x_{i} - x_{min}}{x_{max} - x_{min}}, \forall x_{i} \in X\tag{2}\end{equation*}
4) Model Training
Figure 6 illustrates the model architecture of the multivariate time-series learning model proposed in [28], which is adopted in this paper. Both the qnet and pnet start training at the same time by tuning the network parameters (u*-s, w*-s, and b*-s). Like VAE models, the model is trained straightforwardly by optimizing ELBO, as described in section III.C. The size of each input sequence data (e.g., \begin{align*}&\hspace {-0.5pc}{{\overset {\sim }{\mathcal {L}}}}(x_{t-T:t}) \approx \frac {1}{L}\Sigma _{l=1}^{L}[log(p_\theta (x_{t-T:t}|z_{t-T:t}^{(l)}) \\&\qquad\qquad\quad\displaystyle {+log(p_\theta (z_{t-T:t}^{(l)} - log(q_\theta (z_{t-T:t}^{(l)}|x_{t-T:t})))]]} \tag{3}\end{align*}
5) Online Detection
We now can use the trained model to decide if an observation at a time step (say
6) Dynamic Threshold Selection
As presented in Figure 5, while the during model training module is operating, we first calculate an anomaly score for every observation with a multivariate time series of
EVT is a statistical theory whose aim is to find the law of extreme values, and extreme values are normally put at the tails of a probability distribution. The benefit of EVT is that when finding extreme values, it makes no assumption about the distribution of data. Peaks-over-threshold (POT) [31] is another theorem, which is used to fit the tail portion of a probability distribution by a using Pareto distribution (GPD) with parameters. POT is also adopted to learn the threshold of anomaly scores. Normally, values at the high end of a distribution are the main focus of other POT applications; however, in this study [28], anomalies are placed at the low end of the distribution. Thus, the GPD function is as follows:\begin{equation*} \overline {F}_{(s)} = P(th -S > s|S < th) \sim \left({1 + \frac {\gamma ^{s}}{\beta }}\right)^{-\frac {1}{\gamma }}\tag{4}\end{equation*}
\begin{equation*} th_{F} \simeq th - \frac {\hat {\beta }}{\hat {\gamma }}\left({\left({\frac {q^{N'}}{N'_{th}}}\right)^{\hat {\gamma }} -1}\right)\tag{5}\end{equation*}
7) Anomaly Score Comparison Function
This function compares the calculated anomaly score and calculated threshold, if
8) Mitigation Agent
If the traffic is labeled as an attack source in the previous step, then in order to mitigate the compromised switch attacks, the mitigation agent sends a flow_mod message attached to a delete action to the edge OpenFlow switch and requests the forwarding engine of the SDN controller to drop packet_in messages of the attacking sources (as illustrated in Figure 5).
Experimental Setup
First, we describe attack scenarios that are deployed in this work. Second, we give readers a description of our training dataset and of training the core module of ECSD. Third, we present an elaborate implementation of the proposed scheme.
A. Attack Scenarios
To deploy compromised switch attacks, we replicate some of the attacks mentioned by the authors of both [25] and [26] as follows:
1) Network-to-Switch(S) (NtS)
One or more forwarding devices dispatch a large amount of traffic to a specific OpenFlow switch by installing custom rules for the purpose of flooding the switch. This kind of attack brings down a target switch in serious cases, and the consequence is even more disastrous in such situations such as dealing with mission-critical systems.
2) Network DoS (NDoS)
Compromised switches direct traffic into a loop and enlarge a flow until it completely fills out the available link bandwidth causing a DoS. This attack relates a compromised switch that either generates, misroutes, or replays packets, which damages the whole network topology.
3) TCAM Exhaustion (TCAME)
First, malicious hosts may forward an unreasonable amount of traffic and compel the controller to install many flow rules. Second, TCAM is a fast-associative memory designed to store flow rules. These two reasons result in exhausting the switch’s TCAM utterly. This may trigger significant latency or packet drops.
4) Switch Blackhole (SBlackhole)
A blackhole is a network condition where the flow path ends abruptly, and the traffic cannot be forwarded as intended to the destination. In this case, a compromised switch either drops or delays packet forwarding to launch this attack, thereby preventing the flow from reaching the destination.
B. Training Dataset
The server machine dataset (SMD) is one of the most credible datasets, and it is applied in many multivariate time-series studies [28], [37]. This new dataset collects diverse real network traffic types from a large Internet company, and it was publicly published in [28]. In our study, this dataset was used to train the models of ECSD. Specifically, the SMD dataset includes anomaly traffic with a ratio of 10.72%, and the training and testing set sizes are 58317 and 73729, respectively. Moreover, we collected more 30,000 data samples, both anomalous and normal traffic, recorded in our simulated network for the training set.
C. Sensitivity Analysis of User-Defined Parameters and Model Setup
In order to reduce the number of experiments and obtain the best performance for the proposed algorithm, its user-defined parameters are beneficial to investigate. There are four major key parameters in the proposed scheme, i.e., GRU layers and dense layers,
To train the model training module of ECSD, we strictly followed the descriptions in [28] and obtain acceptable user-defined parameter settings according to our experimental results. In addition, our model hyper-parameters are described as follows. The length of input data sequence was set to 100 (i.e., T + 1 = 100). The GRU layers and dense layers have 500 units. The
D. SDN-Based Cloud Implementation and Test Preparation
Figure 7 shows the testbed for virtual networks to simulate a compromised switch attack in a production environment that consists of a router (HP MSR930–JG511A), a physical switch (HP 3500 yl), an OpenStack platform (one controller node, one network node, an SDN controller (ONOS), and three compute nodes running OvS drivers on OpenFlow virtual switches for cloud networking), the Internet connection and botnets. Moreover, a synchronized connection was established between the OpenStack controllers and the SDN for the cloud information and modifications that are needed.
Experimental SDN-based cloud topology, including an OpenFlow switch, being compromised by an attacker.
To simulate the compromised switch attacks, we manipulated the ONOS controller to perform the attacks mentioned in section V.A. We coded each of the attack scenarios using Python, sFlow [44] and sFlow-RT [43] were used to collect data. As we stated in section IV.A, our modules were located in the SDN application layer of the ONOS server (Figure 4). However, for reducing time in the training phases, we trained our models in a distributed manner on three different nodes: the ONOS server, network node and the OpenStack controller using an Apache Spark cluster [45]. The system configuration of each node is a dodeca-core CPU (Intel® Core™ i7-8700 CPU @ 3.20GHz), and a 48GB memory, equipped with a Gerfore 1060 GPU with 3GB onboard memory, running 64 bit Ubuntu Linux v18.04.
Result Analysis
This section contains a comprehensive analysis along with detailed comparisons between our proposed scheme and other solutions. First, we present the detection performance evaluation. Second, the attack mitigation performance is analyzed, and then the resource usage is provided. Lastly, we briefly discuss the overall performance of our proposed scheme and present the deployment discussion.
A. Solutions for Comparison
To demonstrate the performance of our ECSD scheme, we carried out experiments to compare our work with other different multivariate time-series anomaly detection algorithms and some of the existing studies by other authors on the same environmental set-up. First, we compared the two machine-learning-based methods that use a multivariate time-series technique, which are One-Class SVM [39] and long-short-term memories (LSTMs) [37]. The former algorithm is unsupervised learning that has the advantage of being able to use a very small training set to learn a classification function, which is a lightweight approach. One-Class SVM normally leads to a high false-alarm rate and poor accuracy. Next, the LSTM approach, which is also a deep-learning-based technique, gives better predictions; however, its performance is still low overall and with high resource usage. Second, we reproduced the existing work, WedgeTail [26], to compare with our proposed scheme. WedgeTail is a recent study that is designed to secure the SDN data plane. This study is a novel study and produces fairly good performance overall. Moreover, it uses various measurements that are suitable for comparison. Last, our previous work, ARIMA [16], was chosen to demonstrate the expected improvement of ECSD.
Note that we implemented these solutions, from [16], [26], [37], and [39], in our testbed according to the above observations to retain their original novelties and functionalities.
B. F1-Score, Detection Rate, Accuracy, and False-Alarm-Rate Performance
To properly evaluate the performance of our proposed scheme, we closely follow and adopt some criteria, such as detection rate, accuracy, false alarm rate (
Detection Rate or Recall (
): is the ratio of the number of detected malicious flows to the number of all malicious flows:\overline {R} \begin{equation*} \overline {R} = \frac {TP}{TP + FN}\tag{6}\end{equation*} View Source\begin{equation*} \overline {R} = \frac {TP}{TP + FN}\tag{6}\end{equation*}
Accuracy (
): is the percentage of true detection over the total the number of flows:\overline {AC} \begin{equation*} \overline {AC} = \frac {TP + TN}{TP + TN + FP + FN}\tag{7}\end{equation*} View Source\begin{equation*} \overline {AC} = \frac {TP + TN}{TP + TN + FP + FN}\tag{7}\end{equation*}
False Alarm Rate (
): is the proportion of abnormal flows incorrectly identified as normal flows:\overline {FAR} Therefore, this score takes both FPs and FNs into account.:\begin{equation*} \overline {FAR} = \frac {FP}{FP + TN}\tag{8}\end{equation*} View Source\begin{equation*} \overline {FAR} = \frac {FP}{FP + TN}\tag{8}\end{equation*}
F1-score (F1): is the weighted average of
and\overline {P} .\overline {R} where\begin{equation*} F1 = \frac {2 * \overline {P} * \overline {R}}{\overline {P} + \overline {R}}\tag{9}\end{equation*} View Source\begin{equation*} F1 = \frac {2 * \overline {P} * \overline {R}}{\overline {P} + \overline {R}}\tag{9}\end{equation*}
\overline {P} = \frac {TP}{TP + FP}
Figure 8, compares the five different solutions: ECSD, One-Class SVM, LSTM, ARIMA, and WedgeTail. The One-Class SVM solution performed poorly in terms of detection rate, accuracy, F1-score, and it also had the highest
In Figure 8, in terms of comparison of among four distinct attack scenarios, all detection schemes performed well on the Nts, NDoS and TCAME attack scenarios, which shows that all the schemes can easily detect some kinds of flooding attacks, such as flooding a switch by adding either packets or flows. However, the schemes performed poorly on an SBlackhole attack scenario, which indicates that all the schemes have difficulty in detecting drops or packet delays.
Through the comprehensive analyses above, we can easily see the effectiveness of applying multivariate time-series techniques to detect compromised switches.
C. Attack Mitigation Performance
Assessing the practicality of every networked system is very important, so we now proceed to analyze the results recorded from our experiments among the five compared solutions.
1) Average Detection Time of a New Attack
To assess how fast a new attack can be detected, we simulated different compromised switch attacks as presented in section V.A, then created a function specialized in calculating the average detection time of a new attack, as shown in Figure 9. It is obvious that the ECSD scheme would require a longer detection time than One-Class SVM and ARIMA solutions to detect a new incoming attack in the simulated network. This is because both the latter solutions are comparatively lightweight, due to using a very small amount of historical data for training, as analyzed in section VI.A. Both ECSD and LSTM are deep-learning-based solutions, but LSTM takes around 0.15 seconds longer than ECSD to raise an alarm. This can be explained by the optimized algorithm of the ECSD and the practical use of a data preprocessing module to make it run faster. Lastly, WedgeTail takes a large amount of time to raise a detection flag due to a highly complex architecture with many policies. In summary, the detection time depends largely on how lightweight or heavyweight a scheme is.
2) Number of Dropped Malicious Packets
To protect OpenFlow switches effectively against attacks, one key evaluation criterion is how many malicious packets are dropped during the attack. Hence, the total number of dropped malicious packets when the compromised switches are under attack was collected, and the results are shown in Figure 10. Evidently, owing to the high detection rates, ECSD, LSTM, and WedgeTail conduct and implement policies as soon as an attack pattern is recognized leading to a high number of dropped attack packets. In the case of One-Class SVM and ARIMA, they are not as efficient in detecting anomalies as the three solutions above; therefore, they dropped a low number of malicious packets during the attack.
Number of dropped packets over the duration of attack time for different approaches.
3) Number of Packet_In Messages to the SDN Control Plane
Compromised switch attacks can not only produce harmful effects on the data plane but can also have damaging implications on the control plane. In our experiments, we considered the number of packet_in messages to the SDN controller as a measure of this criterion. As illustrated in Figure 11, the number of packet_in messages for different solutions show opposing results to the detection rate. Because One-Class SVM has the lowest detection rate, this triggers many flow mismatching events, which leads to the highest number of packet_in messages being created and forwarded to the ONOS controller, compared to the other four solutions. In contrast, ECSD and WedgeTail show a reasonable packet_in rate to the controller plane, and this is due to a reasonable level of malicious traffic detection and fast implementation of policies in the data plane, which protects the SDN control plane against a packet_in flooding attack [41] with high-rate attacks.
Number of packet_in messages to the SDN control plane during attacks for different approaches.
D. Resource Consumption
In Figure 12, as stated in section VI, due to the lightweight character of One-Class SVM, it consumes a low amount of CPU, around 29% on average; whereas, both deep-learning-based solutions (ECSD and LSTM) use a higher amount of CPU, around 52% and 55%, respectively. Regarding memory usage, as shown in Figure 13, all schemes consume only a low amount of memory. ARIMA memory usage fluctuates between 1% and 2.6% because it uses little historical data for training. Both the deep-learning-based solutions consume a higher amount of memory; however, it is still low overall.
E. General Discussion
Through the comprehensive analyses above, we can summarize some remarkable points that prove the practicality and effectiveness in reality of applying multivariate time-series techniques (LSTMs [37], One-Class SVM [39], and especially ECSD) to detect compromised switches deployed in our practical experimental setup:
All the multivariate time-series techniques not only yield high detection rates, accuracies, and F1-scores but also produce low false-alarm rates, in which ECSD outperforms the other schemes.
All the multivariate time-series techniques can detect a new attack in a short time.
LSTMs and ECSD solutions drop the number of packet_in messages reasonably.
LSTMs and ECSD solutions consume a low amount of memory.
F. Deployment Discussion
To assess how sensitive are the results to the super-parameter settings for some tested algorithms, we have some comments as follows. The dimension of z-space plays a crucial role in ECSD, a large value would make dimension reduction have marginal effect so the reconstruction probability is impossible to discover a good posterior, and too small of it may emerge under-fitting phenomenon. In LSTM [37], values of z, which are an ordered set of positive values representing the number of standard deviations
Conclusion
In this paper, we proposed an effective and practical scheme to handle compromised switch attacks in an SDN-based cloud environment. This proposal not only protects OpenFlow switches on a cloud from traffic loss, traffic misrouting, traffic delays, and so on, resulting in compromised switch attacks in both the control and data planes, but also brings a better quality of service to cloud providers and customers. We presented a new approach to detect compromised switches using a multivariate time-series detection technique. In our comprehensive analysis, the approach proved highly practical and effective. We also proposed an intelligent scheme called ECSD that applied deep learning techniques to detect the anomalous behaviors that occurred in compromised switches. Moreover, the proposed dynamic threshold selection dynamically sets thresholds, which makes the scheme very practical because it can adapt to changes. For our future work, we plan to create a scheme using an algorithm known as multi-tasking, which allows one algorithm to learn multiple multivariate time-series models simultaneously. This can reduce resource consumption overall. In addition, we intend to compare our proposal with other existing works using more evaluation criteria.