Introduction
Critical infrastructures are highly complex systems that utilize cyber and physical components in their daily operations. The backbone of these facilities consists of an Industrial Control System..(ICS), which plays an important role in the monitoring and control of critical infrastructures such as smart power grids, oil and gas, aerospace, and transportation [1], [2]. Therefore, the safety and security of ICSs are paramount for national security.
The inclusion of the Internet of Things (IoT) in ICSs opens up opportunities for cybercriminals to leverage the system vulnerabilities towards launching cyber-attacks [3], [4]. Awareness of the cyber-security vulnerability in ICSs has been growing since Stuxnet, the first cyber-attack that specifically targeted these technologies, revealed in 2010. Stuxnet intended to sabotage the system’s operation without disturbing Information Technology (IT) systems [5]. In 2015, another cyberattack by the name of Black-Energy was used to target Ukraine’s power grids, causing a massive power outage that affected about 230,000 people [6]. In February 2020, three U.S. gas pipeline firms announced another cyber-attack alleging a shutdown of electronic communication systems for multiple days [7]. While some of these attacks may result in information leakage, others can damage the physical system or misrepresent the system state to the monitoring engineer. These examples emphasize the growing cyber threat on Operational Technology (OT), which runs much of the enabling computer technologies that ICS in critical infrastructure (i.e., power, gas, and water), now rely on [2], [8].
While the security concerns of critical infrastructure facilities are already considered in the IT community, limited efforts have been made to develop security solutions that are specific to ICSs and OT environments [9]. Due to the differences between the nature and characteristics of IT and OT systems, these attacks mostly remain invisible to the traditional IT security measures such as Intrusion Detection Systems (IDSs) and anti-virus programs. Also, the communication protocols used by ICS (e.g., Modbus or DNP3 [10] and IEC standards [11]) are not adequately secured by traditional IDSs. Therefore, strong security mechanisms are required to be designed explicitly for OT environments and ICSs to defend such attacks and to protect critical infrastructure facilities.
Different frameworks for IDSs have been used in the literature, such as model-based [12], and learning-based approaches [13], [14]. Most of these techniques utilize the available data to develop a model that exhibits the normal behavior of the system, then identify all different behaviors as abnormal. Since these methods are only trained on specific types of attacks, they are not able to detect unseen or new attack types [15], [16]. Besides, current IDSs are. customized for specific systems/protocols, which lack adequate generalization [18].
Most importantly, the existing literature does not consider the imbalanced nature of ICS datasets, which results in low detection rates or high false-positive in real scenarios [17]. A dataset is imbalanced if the instances of some classes are far fewer than other classes. The fundamental principle of classification is finding the boundary between different classes. If some classes are rarely presented, they may not be able to provide enough information to determine the boundary. Therefore, they may be treated as outliers resulting in wrong classifications.
Confronting these concerns, in this paper, we propose a generalized ensemble deep learning method for cyber-attack detection in ICS, which is evaluated on different real ICS datasets. The proposed deep learning model consists of multiple unsupervised Stacked Autoencoders (SAE) that learn new representations from imbalanced datasets. Then, new representations from each SAE are passed to a Deep Neural Network (DNN) via super vector and concatenated using a fusion activation vector. Finally, a Decision Tree (DT) is used, as a binary classifier, to detect attacks from the newly merged representations. Experiments show that the proposed model outperforms existing approaches with an acceptable performance even though fewer malicious instances are used.
The main contributions of this paper are as follows:
Developing a deep representation learning model to construct new balanced representations. The new representations increased attack detection accuracy and robustness (f-score) in an imbalanced environment.
Increasing the detection accuracy and reducing the false positive rate by developing an ensemble deep learning algorithm based on DNN and DT classifiers to detect cyber-attacks from the new representations.
Developing a generalized model that can be used in different critical infrastructure facilities with minimum changes in the existing system. The proposed framework utilizes representation learning and ensemble methods that can be trained to detect cyber-attacks in ICSs regardless of the data imbalance ratio.
The rest of this paper is structured as follows. Section II gives a literature review of recent studies in the field of ICS security. Section III presents a brief overview of the general ICS structure, system model, and different attack models considered in this work. The proposed method is described in Section VI. Section V includes results and case studies followed by the concluding remarks in Section VI.
Related Work
Traditionally, ICSs were in an isolated environment with the focus on safety, where each system is safeguarded to stop the process if something goes wrong. However, the introduction of Internet protocols, IoT devices, and wireless technologies within ICSs has resulted in significantly less isolation from the outside world. Consequently, safety mechanisms, which were not designed to deal with malicious attacks, face more vulnerabilities than ever before.
The majority of current existing techniques on cyber-attack detection in ICSs are based on traditional IDSs, which are mainly designed for IT security analysis [5], [17]. IDSs can be categorized as signature-based and learning-based techniques. Signature-based approaches use databases and fixed signatures to detect known attacks, rendering them inefficient in detecting unknown or new attacks [19]. On the other hand, learning-based systems aim to identify process trends or behaviors that increase the efficiency to manage unexpected intrusions [20]. Reference [21] used a common-path mining method for anomaly detection in smart cyber-physical grids. An attack detection technique based on the Pearson correlation between two sensor parameters was used in [22]. Authors in [23] utilized an IDS based on the Gaussian process to the attack strategy for anomaly detection. While these approaches are effective in detecting unusual activates, they are not reliable due to frequent upgrades in the network, resulting in different IDS topologies.
In contrast, learning-based IDSs are designed based on a moving target to continually evolve and learn new vulnerabilities [24], [25]. These methods try to generate the normal behavior of the system using existing datasets, then identify the irregular pattern as abnormalities. The authors of [26] proposed an anomaly detection technique based on reinforcement learning and convolutional autoencoders for ICS. Alternatively, [27] addresses the detection of Denial of Service (DoS) attacks using Support Vector Machine (SVM) and RF. Reference [28] suggested an unsupervised technique for the effective detection of privacy attacks based on observations of eavesdropping attacks. Reference [29] uses a variety of DNN methods, including different variants of convolutional and recurrent networks for cyber-attack detection in water treatment facilities. An ICS anomaly detection method using Long Short-term Memory (LSTM) networks is proposed in [30]. The authors of [31] proposed an attack detection techniques based on Hierarchical Neural Network. Similarly, [32] proposed a deep learning-based IDS through utilizing Recurrent Neural Networks (RNNs).
In another study [33], the authors applied a stacked Nonsymmetric Deep Autoencoder (NDAE) to develop their IDS. Reference [34] proposed an unauthorized intrusion detection technique and conducted backdoor attacks on a SCADA Industrial Internet of Things (IIoT) testbed. Reference [35] proposed a graphical model-based approach for detecting abnormal behavior in an ICS using Bayesian networks to map the relationship between sensors and actuators. Reference [36] implemented a toolchain with multiple state-of-the-art Anomaly Detection (AD) techniques used for detecting attacks that appear as anomalies. Their findings suggest that detection rates can change dramatically when considering different detection modes, thereby necessitating a reliable and real-time AD technique to maintain resilience in critical infrastructures. Reference [37] proposes a genetic algorithm (GA) to find the best NN architecture for a given dataset, using the NAB metric to determine the consistency and quality of different architectures. Reference [38] evaluates the application of unsupervised machine learning algorithms, including DNN and SVM, to detect anomalies in the Cyber-Physical System (CPS) using data from a Secure Water Treatment (SWaT) testbed. Results indicate that the DNN classifier results in less false positives when compared to the one-class SVM, while SVM can detect more anomalies.
Although the above-mentioned works addressed some of the issues related to cyber-attack detection in ICSs, most of them are heavily reliant on feature engineering. These methods are quite complicated and require sophisticated learning techniques, which can potentially increase their computational burden. Furthermore, the majority of current proposed techniques are evaluated using balanced datasets, which lack the standard representation of imbalanced data in the ICS environment. Thus, it is hard to deploy such algorithms as they cannot extract various discriminative information from real-world imbalanced datasets. As such, in this paper, we propose a deep learning-based attack detection technique, which extracts a new representation from raw imbalanced datasets, for reliable and accurate attack detection with a low false-positive rate in highly imbalanced datasets from ICS environments.
System Model
A. Industrial Control Systems
A typical ICS network in a SCADA system architecture, as shown in Figure 1, consists mainly of a remote station, primary center, and regional center. These systems can interact with each other via wide/local area networks or Radio Telemetry. The primary center gathers data from field sensors, identifies new setpoints to track the operations of the network, and detects any existing irregularities. Then, instructions are sent to the remote station to monitor telemetry from field devices [39]. The regional station manages the network communication and regional power consumption between the primary and remote stations.
ICS can be modeled using non-linear and non-Gaussian processes through the following equations:\begin{align*} \mathrm {x}_{\mathrm {k}}=&\mathrm {g}\left (\mathrm {x}_{\mathrm {k-1}},\omega _{\mathrm {k}} \right)\tag{1}\\ \mathrm {y}_{\mathrm {k}}=& \mathrm {h(}\mathrm {x}_{\mathrm {k}},\upsilon _{\mathrm {k}})\tag{2}\end{align*}
B. Adversary Model
The main attack types addressed in this study involve integrity attacks, such as False Data Injection (FDI) and availability attacks, such as DoS. In FDI attacks, an attacker executes the attack by injecting false data into the system shown in the equation below:\begin{equation*} \tilde {y}=y+\alpha oy^{a}\tag{3}\end{equation*}
\begin{equation*} S_{\alpha }\boldsymbol {\triangleq }\left\{ \alpha \epsilon \mathbb {R}^{m}\mathrm {:}\alpha _{i}=0~or~\mathrm {1,}\sum \limits _{i=1}^m {\alpha _{i}= f}\right\}\tag{4}\end{equation*}
On the other hand, DoS attacks include measurement (packet) loss with two main types of modeling, including Bernoulli distribution [40] and Markov model [41]. The attacker usually initiates DoS attacks by manipulating sensor readings and jamming communication channels, thereby flooding packets in the network [42]. This is illustrated below:\begin{align*} \mu _{k}\left (i \right)=\begin{cases} 1 \\ 0 \\ \end{cases}\tag{5}\end{align*}
\begin{align*} \grave {z_{k}}=\left [{\begin{array}{cccccccccccccccccccc} \mu _{k}\left (1 \right)\times \mathrm {z}_{k}\\ \mu _{k}\left (2 \right)\times \mathrm {z}_{k-1}\\ {\begin{array}{cccccccccccccccccccc} \cdots \\ \mu _{k}\left (d+1 \right)\times \mathrm {z}_{k-d}\\ \end{array}}\\ \end{array}} \right]\tag{6}\end{align*}
Proposed Method
To overcome some of the issues associated with existing approaches, in this section, we propose a generalized deep learning model that works with raw imbalanced datasets. The proposed model generates a new balanced representation from a raw dataset and feeds it to an ensemble deep learning model for classification. The deep learning model consists of multiple unsupervised SAE that learns new representations from imbalanced datasets. The SAE attack detection model utilizes multiple Autoencoders (AE) to extract a new representation from unlabeled data to obtain different patterns. Then, new representations from each SAE are passed to a DNN via super vector and concatenated using a fusion activation vector. Finally, a DT is used, as a binary classifier, to detect attacks from the newly merged representations. The schematic of the proposed model is presented in Figure 2.
A. The Proposed Ensemble Deep Representation Learning Model
Most existing approaches proposed in literature neglect the fact that real ICSs are highly imbalanced (the number of attack samples is a lot less than the number of normal samples). This will result in a low f-measure, which reflects the low performance of these models in an imbalanced environment like ICSs, thereby makes them impractical for real-world use cases.
Once a model is directly trained with a highly imbalanced dataset, the new malicious data are likely to be misclassified. To address this problem, we propose an ensemble deep representation-learning model based on SAE to enhance the overall performance of the model. This is done through extracting an equal balanced set and passing it to multiple AE to generate new representations. The input sample \begin{equation*} h_{i}=f\left (x_{i} \right)=\sigma (W_{1}x_{i}+b_{1})\tag{7}\end{equation*}
To enhance the performance of each AE, a dropout layer is added to enhance the generalization of our model by reducing the reliance of the output on a specific set of parameters. Also, the number of nodes and layers was selected through cross-validation of various networks with critical analysis of loss history and validation accuracy. Binary Cross-Entropy (BCE) is used as the cost function, represented by:\begin{equation*} J=-\frac {1}{N}\sum \limits _{i=1}^N {y_{i}\mathrm {.}\log \left (p\left (y_{i} \right) \right)+\left (1-y_{i} \right).log(1-p(y_{i})) }\tag{8}\end{equation*}
B. The Proposed Ensemble Deep Learning Attack Detection Model
Once the new representations are generated form the imbalanced dataset, they are fed to an ensemble of DNN classifiers to detect normal from abnormal behaviors. The results from each DNN is then concatenated, via super vector using a fusion activation function, and passed on to a DT classifier to detect attacks from the newly merged representations. A DT classifier was selected based on multiple tests using different machine learning classifiers, with DT providing the best performance results. The fusion activation function of the sigmoid layer is represented by the following equation:\begin{equation*} L_{1}=\sum \limits _{i=1}^m {y_{i}\log \left (t_{i} \right).w_{s}+\left (1-y_{i} \right)\log \left (1-t_{i} \right).w_{l}}\tag{9}\end{equation*}
sample.
The AEs were tested in a for loop, using a different number of layers, neurons, batch sizes, loss and activation functions, optimizers, epochs, and dropout layers, to achieve better accuracy and f-measure. Both SAE and DNN utilize BCE cost function as well as Rectified Linear Unit (ReLU) activation function to achieve best performance measures, represented by:\begin{equation*} ReLU\left (x \right)=\max \left (\mathrm {0,}x \right)\tag{10}\end{equation*}
The pseudocode of the proposed attack detection algorithm is shown in Algorithm 1.
Case Studies and Results Analysis
A. Data Preparation
Ideally, using new real SCADA data should be appraised, but due to the limitations of available real datasets, this study resorted to realistic ICS datasets obtained in 2015 and 2018. In this section, two different ICS datasets are used to evaluate the performance of the proposed algorism’s efficiency against random ICS models.
Gas Pipeline (GP): This dataset is obtained from a gas pipeline system and contains a Modbus validation frame of a preprocessed dataset in an Attribute-Relation File Format (ARFF) to help researchers use specialized preprocessing techniques. It also has a deep packet inspection of the Modbus frame with each line representing one network transaction. The dataset contains 17 features, with a total of 274628 observations split into 219702 (80%) samples for training and 54925 (20%) for testing [44].
Secure Water Treatment (SWaT): This dataset includes 11 days of continuous operation, in which 7 days were recorded under normal operation conditions and 4 days with attack scenarios. SaT contains a total of 51 features, collected from network traffic ports, sensors, and actuators, with a total of 1048576 observations split into 838860 (80%) samples for training and 209715 (20%) for testing [45].
B. Evaluation Metrics
When it comes to the security of ICSs, the concern revolves around detecting cyber-attacks while achieving high f1-scores on imbalanced datasets, thereby minimizing the rate of false alarms. As with standard machine learning benchmarking metrics, this work considers True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN), defined in Table 1, as the performance evaluation metrics for the attack detection models.
The performance of the machine learning algorithms is measured by the following metrics [44]:
Accuracy: Ratio of samples classified correctly over the entire dataset.
\begin{equation*} Acc=\frac {TP+TN}{TP+TN+FP+FN}\tag{11}\end{equation*} View Source\begin{equation*} Acc=\frac {TP+TN}{TP+TN+FP+FN}\tag{11}\end{equation*}
Precision: The percentage of correctly classified positive samples.
\begin{equation*} Prec=\frac {TP}{TP+FP}\tag{12}\end{equation*} View Source\begin{equation*} Prec=\frac {TP}{TP+FP}\tag{12}\end{equation*}
Recall: The ratio of correctly predicted positive samples over the total samples of the corresponding class.
\begin{equation*} Rec=\frac {TP}{TP+FN}\tag{13}\end{equation*} View Source\begin{equation*} Rec=\frac {TP}{TP+FN}\tag{13}\end{equation*}
F1 Score: Harmonic mean of precision and recall.
\begin{equation*} F1=\frac {2 \times TP}{2 \times TP+FN+FP}\tag{14}\end{equation*} View Source\begin{equation*} F1=\frac {2 \times TP}{2 \times TP+FN+FP}\tag{14}\end{equation*}
The F1-score aims to find an equal balance between precision and recall, which is highly important in performance evaluation for imbalanced datasets (i.e., the number of attack samples are a lot less than the number of normal samples).
C. Performance Analysis
1) General Performance Analysis
In this section, two different ICS datasets gathered from a gas pipeline system and a water treatment facility were used to evaluate the performance of the proposed method. Results were compared with DNN, RF, DT, and AdaBoost based classifiers along with multiple peer approaches in the current literature. Tables 2 and 3 provide a summary of performance evaluation metrics results, including accuracy, precision, recall, and F1-score. As illustrated, the results of the proposed method, in both datasets, outperform existing techniques in all four metrics, and most importantly on f-measure, which highlight the efficiency of the proposed model in imbalanced ICS environments.
2) Imbalanced Testing
To evaluate the efficiency of the proposed method under different imbalanced conditions, we have tested the model with different imbalanced ratios. Imbalanced ratio of 0.1 means 10% of the attack samples were used, and in the same way, an imbalanced ratio of 1 means a %100 is utilized.
As shown in Figures 3–6, results of the proposed method exceed other techniques with a flat curve in all metrics for the GP dataset. This verifies the robustness of the proposed method as its performance is not affected by different imbalanced ratios. Although other methods have an acceptable accuracy, the recall and precision are significantly lower than that of the proposed method. However, our proposed method maintains consistent results in all four metrics.
For further analysis, the proposed model was evaluated on the SWaT dataset, too. Since the model is generalized for different ICS environment, the proposed model was tested without any modification on the model structure or parameters. As illustrated in Figure 7–10, the proposed method outperforms existing techniques in all four metrics. Better performance compared to the first case study could be attributed to the fact that there are more samples for training in the SWaT dataset than what exists in the GP dataset.
Conclusion
Critical infrastructures are complex cyber and physical systems that structure the lifeline of modern society, and their reliable and secure operations are essential to national security. In this paper, we proposed a generalized ensemble deep learning-based cyber-attack detection method specifically designed for ICS. The proposed technique includes a deep representation-learning model, which constructs new balanced representations from the raw imbalanced dataset. The new representations are then used in an ensemble deep learning algorithm based on DNN and DT classifiers to detect cyber-attacks. The performance of the proposed model is verified using two different ICS datasets obtained from real critical infrastructure facilities. Our proposed approach outperformed conventional classifiers with %10 higher f1-score in both datasets evaluated and produced higher accuracy, with %95.86 for the Gas Pipeline dataset and %99.67 for the Secure Water Treatment dataset. Results were compared with traditional classifiers, such as RF, DNN, and ADA, along with multiple peer proposed approaches in the current literature. The proposed approach outperformed other techniques in all four-evaluation metrics. Although our approach performed better than existing techniques, there is room for improvement when dealing with few samples, as illustrated in the GP dataset. Additionally, identifying the attack type and its location is also very important to prevent processing downtime and computation efficiency once an attack is detected. Therefore, our future work will focus on optimizing the accuracy of the proposed method and developing an additional model to identify different attack types and their locations. This will avoid critical system failure and improve the network security of ICSs against similar cyber-attacks.