Journals & Magazines >IEEE Access >Volume: 8

A Dual-Isolation-Forests-Based Attack Detection Framework for Industrial Control Systems

The proposed attack detection framework is composed of two isolation forests that are trained independently using the normalized raw data and a pre-processed version of t...

Abstract:

The cybersecurity of industrial control systems (ICSs) is becoming increasingly critical under the current advancement in the cyber activity and the Internet of Things (I...Show More

Topic: Emerging Approaches to Cyber Security

Metadata

Abstract:

The cybersecurity of industrial control systems (ICSs) is becoming increasingly critical under the current advancement in the cyber activity and the Internet of Things (IoT) technologies, and their direct impact on several life aspects such as safety, economy, and security. This paper presents a novel semi-supervised dual isolation forests-based (DIF) attack detection system that has been developed using the normal process operation data only and is demonstrated on a scale-down ICS known as the Secure Water Treatment (SWaT) testbed and the Water Distribution (WADI) testbed. The proposed cyber-attack detection framework is composed of two isolation forest models that are trained independently using the normalized raw data and a pre-processed version of the data using Principal Component Analysis (PCA), respectively, to detect attacks by separating-away anomalies. The performance of the proposed method is compared with the previous works, and it demonstrates improvements in terms of the attack detection capability, computational requirements, and applicability to high dimensional systems.

Topic: Emerging Approaches to Cyber Security

The proposed attack detection framework is composed of two isolation forests that are trained independently using the normalized raw data and a pre-processed version of t...

Published in: IEEE Access ( Volume: 8)

Page(s): 36639 - 36651

Date of Publication: 19 February 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.2975066

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Industrial control systems (ICSs) are composed of electrical and mechanical devices, computers, and manual operations supervised by humans. They are mainly used for partial or full automation control in industrial plants and critical infrastructures such as manufacturing industries, chemical plants, power generation and distribution systems, water treatment plants, and others [1]. Their operation has a direct impact on the environment, the safety and health of people, the economy, and national security. Concerns about the security of industrial control systems are increasing, given the growing sophistication of cyber activities. The advancement in the industrial Internet of Things (IoT) technologies is creating more potential threat points and vulnerabilities in the system. There have been a number of cyber-attacks on critical infrastructures in the past few years [2]–[4], and research in cybersecurity of industrial control systems has been evolving to overcome the challenges and vulnerabilities in the current industrial attack detection systems.

Attack detection systems are designed to monitor the events taking place in an information system in order to identify signs of security issues. Anomaly detection is the most commonly used approach for attack detection, which is the process of identifying anomalous events that do not conform to the expected behavior of the system. The main underlying advantage of the anomaly detection approach is its ability to detect unseen and new attacks. Anomaly detection-based attack detection approaches can be implemented using a variety of Machine Learning (ML) algorithms such as Support Vector Machine (SVM) [5], [6], Principal Component Analysis (PCA) [7], Neural Networks [8], clustering analysis [9], Negative Selection Algorithm (NSA) [10], and others. They can be divided into unsupervised, supervised, and semi-supervised learning approaches. In the unsupervised method, the model is developed using unlabeled data that contain normal and anomalous samples, while the labeled normal and attack data are used in the supervised learning scheme. However, in the semi-supervised approach, the model is developed using the normal operation data only.

The work presented in this paper is demonstrated using the datasets obtained from the iTrust Lab testbeds, which are the Secure Water Treatment (SWaT) testbed and the Water Distribution (WADI) testbed. There have been several works in attack detection using the SWaT dataset as in [10]–[22] and limited work using the WADI dataset as in [17]. Most of the previous works on attack detection utilized the normal process data using several ML algorithms such as Negative Selection Algorithm (NSA) [10], Singular Value Decomposition (SVD) [11], Standard Neural Networks (NNs) [12], [13], Convolutional Neural Networks (CNNs) [14], Recurrent Neural Networks (RNNs) [15], [16], and Generative Adversarial Network (GAN) implemented using the Long-Short-Term Memory (LSTM) network [17]. They are based on constructing a model that is able to profile normal system behavior, and then non-conforming observations are identified as anomalies. In [18], an attack detection approach is proposed based on a graphical model developed using a probabilistic deterministic real-time automaton model and a Bayesian network, named as the Time Automata and Bayesian netwORk (TABOR) approach. In [19], supervised learning is used to develop a detection model using SVM. A network-based attack detection system is proposed in [20] to detect attacks in particular communication links in the SWaT testbed. In addition, model-based attack detection methods are proposed in [21], [22] for the SWaT system using approximated discrete models in which invariants are derived from process dynamics and state entanglement among the physical components, to detect attacks.

From the computational overhead aspect, model-based approaches are considered relatively more efficient than data-driven ones for large-sized systems [23]. In addition, the computational complexity differs among the different Machine Learning algorithms as it is well known that CNNs and RNNs involve extensive computations in both training and evaluation phases, while NNs have less computational requirement ranging from average to high [24]. Comparatively, standard ML algorithms such as SVD, PCA, SVM, NSA, etc. are characterized by their low to average computational complexity depending on the problem size [25], [26].

However, model-based approaches in [21], [22] have some limitations such as modeling approximations given the complexity associated with some processes in the system (i.e., the chemical processes, etc.), which affect the detection accuracy. Nevertheless, the difficulty, effort, and time requirements for the system modeling rise with the increase in the complexity of the system, and the reliability of the detection approach is likely to degrade. Even though in [21] the authors proposed an approach for analyzing the security matter of the SWaT testbed such as the vulnerabilities of the system and the possible attack scenarios that can be discovered, the possibility of using this approach in launching attacks that cannot be detected by other approaches, specifically the data-driven methods, depends on the quality of the used system models. In addition, developing high-fidelity system models becomes more challenging as the complexity, the size, and the non-linearity of the system increase.

Methods proposed in [10]–[13], [15], [16] might have the drawbacks of high missed alarm rate and poor performance for high dimensional data. In addition, some approaches have high computational cost such as in [14]–[17], and others, e.g., [10], [11] do not make full use of the process information by disregarding the actuator signals that may contain valuable input about the process status. In addition, the approach proposed in [18] requires that the variables selection must be made manually and empirically by the designer based on the dynamic behavior. The disadvantage of the supervised learning-based attack detection system proposed in [19] is its dependency on the attack data- which are scarce - and the low accuracy of the detection model under new and unseen attack scenarios. TABLE 1 presents a summary of the previous works done using the SWaT and WADI datasets for intrusion and attack detection.

TABLE 1 Summary of the Previous Works on the iTrust Lab Datasets for Attack and Intrusion Detection

In this paper, we present a dual isolation forests-based (DIF) attack detection framework for industrial control systems in water treatment plants. The two isolation forest models are trained independently, one using the normalized raw data and the other using a pre-processed version of the data using PCA. The idea behind using two models is to inspect the data in two representations; one in the original data space and the other in the principal component space, thus, elevating the capability of the detection approach. Its main objective is to address the limitations of the previous works given that isolation forests have low computational complexity and high applicability to complex and high dimensional data. They can be used on mixed datasets-containing continuous and discrete variables- that facilitates harnessing the available data when developing the model. They can be used in both semi-supervised and unsupervised learning schemes. Unlike most of the previous works, they are based on pointing out anomalies using the concept of isolation, which improves the attack detection capability. There have been a couple of implementations of isolation forest-based approaches for attack detection, such as in [27] for smart grid networks and in [28] for information security.

The contributions of this work can be summarized as follows:

A dual-isolation forests-based attack detection framework is proposed for industrial control systems in water treatment plants utilizing the normal process data of actuator signals and sensor measurements.
The proposed approach is based on the principle of separating-away observations that are anomalous, which improves its ability to detect attacks.
Due to the nature of the isolation forest, it can harness the available information about the process by analyzing the relations between the different system variables, which are the sensor measurements and the actuator signals.
It can exploit the available data of the system by learning from the process data in the original, as well as the PCA-transformed representations.
It provides an efficient solution in terms of computational complexity when compared to Deep Learning-based approaches.

The paper is organized as follows. The description of the systems under study is presented in Section II. In Section III, the details of the proposed approach are presented. The models training procedure and the used performance evaluation metrics are explained in Section IV, along with the evaluation and comparison results. Finally, conclusions and future work are summarized in Section V.

SECTION II.

System Description

The work presented in this paper utilizes the experimental process data from the Secure Water Treatment (SWaT) testbed [29], [30] and the Water Distribution (WADI) testbed [31] developed by iTrust Lab at Singapore University of Technology and Design in order to promote research work in the area of cybersecurity of ICSs. The details of the two testbeds are presented in the following subsections.

A. Secure Water Treatment (SWaT) Testbed

The SWaT testbed is a scaled-down water treatment plant that is composed of 6 processes, as demonstrated in FIGURE 1, and is capable of producing 5 gallons per minute of fresh water. The data were collected for a total of 11 days in which 36 different attacks were injected during the last four days by hijacking the packets in the communication links between the SCADA system and the Programmable Logic Controllers (PLCs) comprising around 6% of the total data samples. The network packets were altered to reflect the spoofed values from the sensors [29]. The dataset consists of measurements from a total of 25 sensors for water level, flow rate, pressure, and chemical decomposition, and signals from 26 actuators, such as pumps and valves. The description of the SWaT attack scenarios is provided in TABLE 2.

TABLE 2 Description of the Attack Scenarios on the SWaT Testbed

FIGURE 1.

The physical water treatment process in the SWaT testbed. P1 through P6 indicate the six stages in the SWaT process - with each having its dedicated PLC - starting with the raw water intake, then the pre-treatment and filtration stage, and finally the reverse osmosis process. Solid arrows indicate the flow of water or chemicals in the dosing station. Dashed arrows indicate potential cyber-attack points. LIT: Level Indicator and Transmitter; Pxxx: Pump; AITxxx: Property indicator and Transmitter; DPIT: Differential Pressure Indicator and Transmitter [32].

MIT Libraries

MIT Libraries

A Dual-Isolation-Forests-Based Attack Detection Framework for Industrial Control Systems

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

System Description

A. Secure Water Treatment (SWaT) Testbed

B. Water Distribution (WADI) Testbed

Proposed Method

A. Data Pre-Processing Algorithms

1) Data Normalization

2) Principal Component Analysis (PCA)

B. Isolation Forest-Based Anomaly Detection Approach

Algorithm 1 Train ${Forest} \left({X,n_{\textrm{estimators}},m_{\textrm{max}},n_{\textrm{max}}}\right)$

Algorithm 2 Train $i{Tree} \left({X' }\right)$

Evaluation

A. Performance Metrics

1) Precision

2) Recall

3) F1 – score

B. Datasets

C. Model Training

D. Comparison With Other Approaches

E. Case Studies

1) SWaT Security Showdown (S3)

2) Analysis of Adversarial Attacks

Conclusion

References

3) F₁ – score

1) SWaT Security Showdown (S³)