Loading web-font TeX/Main/Regular
Fast Anomalous Traffic Detection System for Secure Vehicular Communications | TUP Journals & Magazine | IEEE Xplore

Fast Anomalous Traffic Detection System for Secure Vehicular Communications


Abstract:

In modern automotive systems, introducing multiple connectivity protocols has transformed in-vehicle network communication, resulting in the widely recognized Controller ...Show More

Abstract:

In modern automotive systems, introducing multiple connectivity protocols has transformed in-vehicle network communication, resulting in the widely recognized Controller Area Network (CAN) standard. Despite its ubiquitous use, the CAN protocol lacks critical security features, making vehicle communications vulnerable to message injection attacks. These assaults might confuse original electronic control units (ECUs) or cause system failures, emphasizing the need for strong cybersecurity solutions in automobile networks. This study addresses this need by developing a quick and efficient abnormal traffic detection system to protect vehicular communications from cyber attacks. The proposed system utilizes four machine learning techniques: Adaboost Trees (ABT), Coarse Decision Trees (CDT), Naive Bayes Classifier (NBC), and Support Vector Machine (SVM). These models were carefully assessed on the Car-Hacking-2018 dataset, which simulates real-time vehicular communication scenarios. Specifically, the system considers five balanced classes, including one normal traffic class and four classes for message injection attacks over the in-vehicle controller area network: fuzzy attack, DoS attack, RPM attack (spoofing), and gear attack (spoofing). Our best performance outcomes belong to the ABT model, which notched 99.8% classification accuracy and 6.67 \mu\text{s} of classification overhead. Such results have outweighed existing in-vehicle intrusion detection systems employing the same/similar dataset.
Published in: Intelligent and Converged Networks ( Volume: 5, Issue: 4, December 2024)
Page(s): 356 - 369
Date of Publication: 04 November 2024
Electronic ISSN: 2708-6240

SECTION 1

Introduction

In-vehicle communications have been flourishing at a dependable ratio since the initial Electronic Control Unit (ECU) began to react with other electronic control units such as the Heating, Ventilation, and Air Conditioning (HVAC) system, the airbag system, the Anti-lock Braking System (ABS), and others[1]. The connections of several ECUs to enable vehicular communications are collectively known as Controller Area Networks (CAN). Figure 1 illustrates the CAN connectivity bus for in-vehicle communication and networking. It is pertinent to mention that the CAN protocol, constructed using a bus topology, has several limitations, such as a single point of failure. However, this architecture is extensively used in vehicle communication due to its lower cost and simplified architecture. Indeed, the CAN protocol can alleviate the limitation issue of BUS topology due to its robustness, advanced error detection, and fault confinement[2], [3].

It has been a while since the radio was the exclusive electronic apparatus within vehicles. However, starting with the new policies and directives applied recently to lessen CO2 emissions and shadowed by the expansion of semiconducting material production, electronic devices and circuits made their way into roughly all vehicle pieces. More recently, the vehicles have even worked autonomously by adopting a Cyber-Physical System (CPS) using smart IoT modules[4].

Fig. 1 - An illustrative example of CAN connectivity for several vehicles ECUs.
Fig. 1

An illustrative example of CAN connectivity for several vehicles ECUs.

Intrusion Detection/Classification Systems (IDS/ICS) can be employed to enhance the network's security and every associated component of cyberspace. Therefore, many IDS and ICS were proposed by researchers to serve the purpose of detecting and classifying network traffic to find suspicious activity[5]. Finally, human interference with the intelligence system (system/security administrator) might be necessary to decide to take a further step toward defending against the attack[6].

Traditional IT security mechanisms such as firewalls, antivirus, and honeypots are used to detect abnormal network activity in the end node. However, due to the heteronomous of autonomous vehicle devices and their designed constraints, they are insufficient to prevent cyber-attacks. In addition, the devices of autonomous vehicles have normal behavior with a more extensive range compared with traditional IT security mechanisms[7]. Many researchers exploit Machine Learning (ML) models to develop IDS or ICS because ML performs accurately when identifying malicious network activity. In addition, ML reduces fake alarms that IDS generates from non-threatening activity. ML models do not require human interference to detect or classify cyber-attacks since they operate autonomously independently of humans. For that reason, ML models need to be trained and tested with a comprehensive dataset to produce the expected results[8].

IDS can be integrated into autonomous vehicles to scan inappropriate CAN network traffic. Thus, to capture suspicious CAN messages, IDS based on ML can be implemented for this purpose[9]. Nevertheless, ML needs to be trained and tested using CAN message features. Also, because autonomous vehicles are very sensitive to any change in the signal, ML needs to be selected with expedited prediction techniques. Modern vehicles have many connections that must be protected from cyber-attacks launched against the in-vehicle communication network. In-vehicle networks employ CAN protocol as a de facto standard for vehicular communications[10]. Despite its robust communication features, however, CAN protocol fails to implement efficient security services, exposing vehicular communication to different types of cyberattacks. For instance, message injection cyberattacks are descriptive cyberattacks that inject fictitious messages to mislead original ECUs or produce failures.

Two types of attacks can be crafted on CAN communication systems: physical access attacks and remote access attacks. In a physical access attack, attackers directly access the CAN communication through the Onboard Diagnostic (OBD) port or a compromised node. Similarly, remote access attacks are carried out through wireless communication interfaces such as the Tire Pressure Monitoring System (TPMS), Bluetooth, FM radio, cellular network, and passive anti-theft systems. Research studies have shown that cars can be stolen using CAN messages through wireless communication interfaces using a reverse engineering process[2]. Additionally, solutions and techniques for secure vehicular communication are being pursued, fostering further vehicle security research.

Vehicles, recently, are becoming more connected and self-driving, which relies heavily on in-vehicle networks. However, these networks, such as the CAN, must be constructed with greater security. This causes them to be vulnerable to cyberattacks that can steal information (confidentiality) or control critical systems (safety) such as brakes and steering systems. These attacks can be as straightforward as jamming signals (DoS) or more complex, such as sending fake data (spoofing) to trick the vehicle. If successful, these attacks could cause the car to malfunction or even crash, causing everyone on the road to be impacted. The problems caused by cyberattacks on car networks are beyond the safety of others. Unsecured communication systems also enable hackers to steal information. Modern cars acquire much data as they become smarter than traditional cars, and safeguarding it from criminals is crucial.

Since cyberattacks on car networks are so large, we require better methods to locate and prevent them. This new system utilizes machine learning and real-time data analysis to detect unusual traffic patterns that could indicate an attack quickly. This proactive approach will enhance the protection of car communication systems and make them more reliable. Specifically, we employ several supervised learning techniques to categorize the cyberattacks for in-vehicle intrusion detection systems. Specifically, the proposed system differentiates the performance of four separate machine-learning approaches: Adaboost Trees (ABT), Coarse Decision Trees (CDT), Naive Bayes Classifier (NBC), and Support Vector Machine (SVM). The models were evaluated on a recent real-time vehicular communications environment dataset, the car-hacking-2018 dataset[11]. Specifically, the system considers five balanced classes, including one normal traffic class and four classes for message injection attacks over the in-vehicle controller area network: fuzzy attack, DoS attack, RPM attack (spoofing), and gear attack (spoofing).

1.1 Summary of Contributions

Specifically, our contributions to this work can be recapped as follows:

  • We present a thorough, effective multi-class classification scheme that can be used to categorize the vehicular communication traffic data of the car-hacking-2018 dataset into 5-classes of in-vehicle cyberattacks.

  • We distinguish the execution capabilities for the in-vehicle IDS using four supervised learning techniques such as ABT, CDT, NBC, and SVM.

  • We provide extensive empirical assessment results to understand better system effectiveness and methodologies using conventional evaluation metrics (including accuracy, precision, recall, F1-score, and prediction time).

  • We compare our best assessment outcomes with recent existing models utilizing several computational intelligence algorithms in the consistent research field.

1.2 Structure of the Paper

The remainder of this article is arranged as follows: Section 2 reviews several recent works developed in the same study area. Section 3 demonstrates the workflow for system development stages, from data collection to target class classification. Section 4 states the model assessment outcomes, findings for all design options, and the comparison with existing models. Lastly, Section 5 closes the proposed work, concludes the findings, and provides future work extensions.

SECTION 2

Related Work

As vehicles' connectivity has increased, they need secure communication[12]. Reference [13] proposed Reconfigurable Intelligent Surfaces (RIS) to ensure communication security in autonomous vehicles. The authors analyzed the performance of RIS in two scenarios, Vehicular-to-Vehicular (V2V) and Vehicular-to-Infrastructure (V2I). In V2V, two or more vehicles communicate with each other, and the RIS works as a relay. However, in V2I, the RIS works as a receiver. According to the authors, the RIS enhanced the security of V2V and V2I.

Several technologies, like blockchain, are essential for improving the integrity and privacy of autonomous vehicles because they incorporate network threat detection into the domain of securing vehicular communications. Blockchain, like Bitcoin, was first intended for use in the financial industry but has also shown versatility in non-financial domains. The authors utilized the blockchain concept in Ref. [14] for vehicle communications, noting its favor because of a distributed network infrastructure that reduces network overhead, guarantees security, and protects privacy.

Federated Learning (FL) is a machine learning method that can enhance the security of vehicle communications because it safeguards the privacy of traffic data exchanged between vehicle components. Unlike traditional approaches that work in centralized architecture, FL is designed to operate in decentralized architecture. Therefore, FL models are trained collaboratively across a distributed network. In Ref. [15], the authors developed an autonomous controller for Connected Autonomous Vehicles (CAVs) using FL. Every vehicle node actively engages in training, helping to manage system dynamics and uncertainties. These developments strengthen vehicle communication systems' security and resilience against hostile threats by incorporating network attack detection techniques.

Researchers used deep-learning approaches to implement detection models for autonomous vehicles. Deep Learning (DL) solves some issues of traditional machine learning algorithms, such as dealing with huge amounts of data and learning from unstructured datasets. In Ref. [16], the authors utilized a Long Short-Term Memory (LSTM) model to develop a detection model for autonomous vehicles. Initially, cyberattacks were designed and implemented into a simulation-based model for autonomous vehicles. The numerical data was captured during the simulation and will be used later for training. As a result, the output of the LSTM model is normal or attacked.

The CAN system is an essential component of autonomous vehicles since the information of the physical devices can be shared through its buses. Authors of Ref. [17] proposed a security method based on a Deep Neural Network (DNN) to detect suspicious CAN messages in autonomous vehicles' buses by employing a Four-layer multi-layer perceptron deep neural network. The authors use three CAN messages: anchor, negative, and positive. Anchor is an artificial CAN message; a negative is an anomalous CAN message and a positive is a normal CAN message. Therefore, the DNN is supposed to distinguish between the three categories mentioned above of data. In addition, the author in Ref. [18] proposed a hybrid intrusion detection system based on LSTM and Convolutional Neural Network (CNN) to detect cyberattacks from in-vehicle network traffic. Their model shows extraordinary detection accuracy, which touched 100% with a low false alarm rate.

Furthermore, deep learning has promise in identifying cyberattacks, particularly replay assaults on remote keyless auto systems, according to Ref. [19]. The authors use a Transfer Learning (TL) technique to fine-tune a ResNet50, which they then use as the model framework. The model's parameters are initialized during pre-training using the KeFRA 2022 dataset. Notably, one of the most important steps in the process is fine-tuning. It is preprocessed before the dataset is fed into a deep neural network, with image augmentation and scaling performed. Data is shuffled and divided into training and testing sets when five-fold cross-validation is used. The TL subsystem uses ResNet-50 CNN for a three-class classification assignment, pre-trained on ImageNet and refined at the output layer. The method improves the system's capacity to recognize and react to possible threats by freezing the core portion of the pre-trained network and modifying hyperparameters to detect cyberattacks in remote keyless cars.

In Ref. [20], the authors developed a security model designed for self-driving cars. Their model combines two deep learning models: CNNs and LSTMs. The CNN-LSTM model primarily identifies cyberattacks on autonomous cars' CAN communication buses. The CNN analyzes traffic data to find patterns, while the LSTM focuses on predicting potential security threats. Building on this idea, a system called CANintelliIDS, described in a different research paper, combines a similar technique (Gated Recurrent Unit, GRU) with a CNN to better identify cyberattacks on CAN buses (communication systems used in cars)[21].

The researchers in Ref. [22] investigated using two different machine learning-based classifiers to identify issues in self-driving cars (CAVs): one to pinpoint whether communication problems were caused by glitches or slowdowns and another to determine if cyberattacks or physical malfunctions were to blame. They contrasted four machine learning techniques (SVM, Quadratic Discriminator (QD), k-Nearest Neighbors (kNN), Naive Bayes Classifier (NBC)) for the classification of CAV faults and observed that SVM was the most effective for the first task and QD for the second. The researchers in Ref. [23] proposed a security system for self-driving vehicles that utilizes Integrated Circuit Metrics (ICMetrics) technology. ICMetrics can create a unique ID for the electronic system, which includes biometric technology. Additionally, ICMetrics can detect features from the vehicle's environment through an infrared sensor, which can be used for authentication.

In Ref. [24], the authors presented an intelligent IDS using TL to detect cyber-attacks on Autonomous Vehicle Cyber-Physical Systems (AV-CPSs). They have gathered a new dataset from a simulated AV-CPS and analyzed it using several pre-trained CNNs. GoogLeNet achieved the best performance with an F1-score of 99.47%. In Ref. [25], Kabilan et al. proposed a lightweight, unsupervised IDS method combining autoencoders for feature extraction and Fuzzy C-Means (FCM) for clustering. They evaluated their system on the ML350 in-vehicle dataset, achieved 75.51% accuracy, and performed well on wireless and network intrusion datasets.

In Ref. [26], Khan et al. introduced DivaCAN, an IDS using an ensemble of classifiers. DivaCAN achieves 94.93% precision, 94.98% recall, and a 94.97% F1-score, with a 406-second execution time, offering a robust solution for securing CAN-based vehicular communication. In Ref. [27], the authors presented a detection method to mitigate path modification, velocity drift attacks, and ghost aircraft injection that targets the aviation system communications and control modules. Their random forest-based model, tested on recent data (ADS-B-2022 dataset), achieved 99.41% accuracy in multi-class classification, showing superior detection capabilities. Further testing with more attack types and data is suggested for improvement. To sum up, Table 1 summarizes the related work among the proposed models discussed in this paper.

Unlike the models reported in the research mentioned above studies, in this research, our proposed system contributes innovative ideas to vehicle cybersecurity. First, we present a complete multi-class classification approach suited to the Car-Hacking-2018 dataset, which allows for precisely categorizing vehicular communication traffic into five kinds of in-vehicle cyberattacks. This method yields a more sophisticated picture of vehicle dangers than binary categorization. We examine the performance of four supervised learning techniques—ABT, CDT, NBC, and SVM—for their application in an in-vehicle Intrusion Detection System. Our empirical assessments include standard evaluation criteria, including accuracy, precision, recall, F1-score, and prediction time, to review each model's capabilities thoroughly. We compare our most successful model to existing techniques based on various computational intelligence algorithms, revealing considerable increases in detection performance. This comparative research demonstrates the resilience and effectiveness of our suggested method for improving vehicle network security.

Table 1 Summary of the related work.
Table 1- Summary of the related work.

SECTION 3

In-Vehicle IDS System

3.1 Dataset

The dataset used in this research is the car-hacking- 2018 dataset[28], [29]. It consists of twelve features: timestamp in seconds, CAN message ID in hexadecimal, number of data bytes as an integer, the data value in a byte with eight features, and flag, which contains the class of normal, message, or injected message as shown in Table 2. There are four types of cyberattacks: DoS attack, fuzzy attack, spoofing gear attack, RPM spoofing attack, and normal. The number of data records for each class is represented in Table 3. Also, the following is a description to summarize the operation for each attack.

  • DoS attack: Floods the system with traffic, overloading vehicular communication and interfering with safety apps and traffic control.

  • A fuzzy attack compromises information integrity and results in incorrect decisions by introducing fake data to confuse drivers and cause confusion in-vehicle communications.

  • A gear attack modifies how gear information is transmitted between vehicles, which could lead to erroneous data and jeopardize the security of the entire vehicle network.

  • RPM Attack: Modifies engine RPM information used in vehicular communication, endangering the reliability of performance assessments and affecting safety applications and traffic management.

However, a data selection process using the Synthetic Minority Over-Sampling Technique (SMOTE) technique has been applied to unify/balance the data sizes to 500 000 samples per class (DoS attack, Fuzzy attack, gear attack (Spoofing), RPM attack (Spoofing), and normal classes)

Table 2 Dataset attributes.
Table 2- Dataset attributes.
Table 3 Dataset category.
Table 3- Dataset category.

The dataset employed in this research can be accessed from the Hacking and Countermeasure Research Lab (HCRL) Car-Hacking Dataset for intrusion detection. https://ocslab.hksecurity.net/ welcome (accessed on Jan 2nd, 2023).

3.2 Machine Learning

In this work, we employ four supervised learning techniques to categorize the cyberattacks for in-vehicle intrusion detection systems: ABT, CDT, NBC, and SVM. The selected approaches were chosen based on their suitability for anomaly detection in vehicular communications. Each approach offers unique advantages and has been widely utilized in intrusion detection systems and cybersecurity applications.

AdaBoost Trees (ABT): AdaBoost is an example of Ensemble Learning (EL) that can generally integrate multiple weak classifiers to perform prediction. The main goal of utilizing EL is to enhance the learner algorithm to achieve optimal results since ensemble learning reduces error and produces less overfitting[30]. This research focuses on using the AdaBoost tree for ensemble learning. AdaBoost is ensemble learning; therefore, different-based learners are created, and in our case, they are based on DT[31]. For each based learner, several DTs are initiated, and then the best fit DT is selected for that specifically based learner. The first chunk of the data is passed to the based learner (#1) for training, and misclassified data is then passed to the based learner (#2). This process will continue until it reaches the final based learner (\#N), as shown in Fig. 2. The figure illustrates the utilization of multiple weak learners to create a strong classifier for improving classification accuracy and reducing overfitting, making it well-suited for detecting anomalies in complex and dynamic vehicular communication networks.

Support Vector Machine (SVM): SVM was selected for its ability to construct optimal hyperplanes for separating different classes in high-dimensional feature spaces. SVM offers robust performance in binary and multi-class classification tasks, making it well-suited for distinguishing between normal and abnormal traffic patterns in vehicular communication networks. Figure 3 shows how SVM solves classification problems[32]. In the beginning, the hyperplane is created to classify the data into two separate groups (in the case of binary classification). SVM also creates a margin between class 1 and class 2 using two parallel hyperplanes; one is close to class 1, and another is close to class 2. The reason behind creating a margin is to create a generalized SVM model that can obtain better accuracy with different data. Therefore, any data placed above the hyperplane will be classified as class 1. However, any data underneath the hyperplane will be classified as class 2.

Fig. 2 - AdaBoost learning technique.
Fig. 2

AdaBoost learning technique.

Coarse Decision Trees (CDT): CDT were chosen for their simplicity and efficiency in handling large datasets with high-dimensional features. By employing a coarse-grained decision-making process, CDT can quickly identify patterns and anomalies in vehicular communication traffic while minimizing computational overhead. Figure 4 illustrates an example to show how CDT works[33]. Assume we have a classification problem with three possible categories: class 1, class 2, and class 3. We can use CDT to solve this problem: if the X value is more than or equal to the Z value, then the model's output should be class 1. Nevertheless, if the answer is NO and the X value is less than Y, the output should be class 2. However, if the X value is less than Y, the output should be class 3.

Fig. 3 - SVM learning technique.
Fig. 3

SVM learning technique.

Fig. 4 - CDT learning technique.
Fig. 4

CDT learning technique.

Naive Bayes Classifier (NBC): NBC is included due to its probabilistic nature and ability to handle categorical and continuous data. NBC is known for its simplicity, speed, and scalability, making it a viable option for real-time anomaly detection in vehicular communications. Naive Bayes classifier is a machine learning model based on the probability method to solve a classification problem[34]. Equation (1) shows the Bayes theorem where y is the class variable, i.e., (yes, no), and X refers to the features. X can be represented as shown in Eq. (2) because X can have several values depending on the number of features being used to predict y. Finally, the NB classifier equation can be written as shown in Eq. (3), where we only look for y with max probability[34]. \begin{gather*}P(y\vert X)= \frac{P(X\vert y)P(y)}{P(X)}\tag{1}\\ X=(x_{1}, x_{2},\ldots, x_{n})\tag{2}\\ y= \arg\!\max\nolimits_{y}P(y) \prod\limits_{i=1}^{n}P(x_{i}\vert y)\tag{3}\end{gather*}

View SourceRight-click on figure for MathML and additional features.

3.3 System Architecture

The system architecture used in this research is depicted in Fig. 5. Firstly, the dataset was downloaded from the link(https://ocslab.hksecurity.net/Datasets/car-hacking-dataset). Secondly, preprocessing was applied to the dataset by removing duplicated records, converting a hexadecimal value to decimal and a timestamp to seconds. Thirdly, the data file was imported into the MATLAB project, and the data was split into two parts: 80% for training and 20% for testing. Fourthly, each ML model was trained, validated, and tested separately with the same data sample. We used a 5-fold cross-validation approach to validate the training of the ML model and to prevent the overfitting issue. Fifthly, the output of each ML should be one of the following possible outputs: normal, DoS attack, fuzzy attack, gear attack (spoofing), or RPM attack (spoofing).

3.4 Data Distribution and Validation

The dataset employed in this model (after balancing) comprises 2 500 000 samples (each class has 500 000 traffic samples). In the data distribution process, we have split the data into 80% for training (not 70%) (comprises 2 000 000 samples, with each class having 400 000 samples) and 20% for testing (not 30%) (comprises 500 000 samples, with each class having 100 000 samples). Also, a five-fold cross-validation (illustrated in Fig. 6) has been employed that uses several train/test splits to evaluate model performance robustly, reduces the fluctuation of performance indicators, identifies both underfitting and overfitting, and provides a fair mix between accurate model evaluation and computational efficiency. We have updated the paper with the explanation above.

SECTION 4

Result and Analysis

Fig. 5 - Proposed system architecture.
Fig. 5

Proposed system architecture.

In this section, we assess the performance of the proposed in-vehicle intrusion detection system using different design options, including ABT-based IDS, SVM-based IDS, NBC-based IDS, and CDT-based IDS. The assessment phase considers the use of conventional metrics to evaluate the supervised machine learning-based models, such as classification accuracy proportion (%), precision proportion (%), recall proportion (%), the harmonic average of precision and recall (F1-score) proportion (%), and the average prediction time measured in microseconds (\mu \mathrm{s}). We report our outcomes in Table 4, which compares and illustrates the performance outcomes for the four alternatives mentioned earlier to opt for the optimal model to be realized and deployed with the proposed in-vehicle intrusion detection system. Consequently, overall, the best evaluation outcomes are exhibited for the model-based ABT, notching the peak rates with 99.8% listed for the accuracy, precision, recall, and the F1-score.

Fig. 6 - Five fold cross-validation policy.
Fig. 6

Five fold cross-validation policy.

Though the prediction time is not the least for the model-based ABT, it's still very competitive, with only 6.67\ \mu \text{seconds} per single prediction, making it very convenient for real-time applications such as the Internet of Vehicles (IoV). The slightest prediction time is listed for CDT with only 0.4\ \mu \text{seconds} per single prediction. This is due to the use of coarse decision trees that perform few numbers of splits (e.g., four splits only), which negatively affects its detectability performance indicators while maintaining the speediest prediction.

Consequently, based on the above investigational assessment, the ABT-based IDS model is confidently selected for development and operation in the proposed in-vehicle intrusion detection system. To gain more insights into the selected model performance, we also demonstrate the confusion matrix analysis corresponding to the ABT-based IDS model in Fig. 7. According to the figure, five classes are reported in the matrix: the DoS attack class, the Fuzzy attack class, the Gear attack class, the RPM attack class, and the Normal class.

Table 4 Evaluation outcomes for every in-vehicle IDS learning model.
Table 4- Evaluation outcomes for every in-vehicle IDS learning model.

Each class has originally 10 000 traffic samples (TP+FP+FN+TN) which are completely correctly classified in almost all the classes except for the Fuzzy attack class, which reported 114 samples (FP+FN) that are incorrectly classified (0.114%), and for the Normal class, which reported ten samples (FP+FN) that are incorrectly classified (0.01%). On the right side of the matrix, we show the sensitivity analysis for each class by investigating both the True Negative Rate, TNR (%) and the False Negative Rate, FNR (%). All classes have indicated a full sensitivity rate (TPR=100% and FNR=0%) with a slight variation for the Fuzzy attack, indicating a TPR of 99.9% and FNR of 0.01%. Besides, on the bottom side of the matrix, we show the predictivity analysis for each class by investigating both the Positive Predictive Value, PPV (%) and the False Discovery Rate, FDR (%). All classes have indicated a full predictivity rate (PPV=100% and FDR=0%) with a very slight variation for the Normal class, indicating a PPV of 99.9% and FDR of 0.01%.

Fig. 7 - Confusion matrix analysis.
Fig. 7

Confusion matrix analysis.

The obtained performance outcomes for the ABT model signify its effectiveness and efficiency in detecting anomalies in vehicular communications. Through high classification accuracy and low classification overhead, the ABT model offers tangible benefits for enhancing the security and reliability of connected and autonomous vehicles in real-world application scenarios. The implications of these results extend beyond individual vehicles to broader ecosystem considerations. As vehicles become increasingly interconnected and autonomous, the security of in-vehicle communication networks becomes a critical concern for the automotive industry, regulatory bodies, and policymakers. The demonstrated effectiveness of the ABT model underscores the feasibility of leveraging machine learning-based intrusion detection systems to address these security challenges proactively.

Lastly, to ensure our proposed IDS structure's relevance, significance, and competency, we compare our best experimental findings with other models reported in the literature for in-vehicle intrusion detection tasks. The comparison is provided in Table 5 below, considering four compassion indicators, including the computational intelligence technique used to build the in-vehicle IDS, the accuracy ratio of model classification, the F1-score ratio of model classification, and the prediction time per single prediction.

In addition to our system, five other systems are compared in Table 4, including (a) the Song et al. system[28] that utilizes the deep learning-based CNN, achieving an accuracy of 99.7% with a relatively large predictive time; (b) the Seo et al. system[26] that utilizes the Generative Adversarial Network (GAN) attained 98% accuracy with a relatively moderate predictive time; (c) the Al-Haija et al. system[19] that utilizes the transfer learning-based residual convolutional neural network with 50 layers (ResNet50) attained 99.7% accuracy and F1-score with a relatively large predictive time; (d) the Tariq et al. system[35] that utilizes the TL technique of a deep cascaded model involving several convolutional LSTM components (CANTransfer), achieving an accuracy of 95.2%; (e) the Javed et al. system[21] that utilizes convolutional attention incorporated with a gated recurrent neural network (CANinyelliIDS), achieving an accuracy of 93.7% with a relatively large predictive time; (f) the Abd et al. system[36] that applies the artificial neural networks for intrusion detection system (ANN-IDS), achieving an accuracy of 92.1% with a relatively large inferencing time; and (g) the Pascale et al. system[37] that utilizes Bayesian Networks for Intrusion Detection System (BN-IDS), achieving an accuracy of 97.1% with a more complex inferencing scheme.

Table 5 Comparison with other in-vehicle IDS learning models employing the same/similar dataset.
Table 5- Comparison with other in-vehicle IDS learning models employing the same/similar dataset.

Eventually, on the whole, our proposed system exhibits the best performance statistics among the compared models. It thus can be realized and utilized confidentially to perform cyber-attack detection and classification of automated in-vehicle systems.

Eventually, our proposed system addresses several limitations observed in previous systems, including the following key points:

  • The proposed system achieved higher classification accuracy than existing IDS, thus enabling more reliable detection of anomalies in in-vehicle communication networks. Using ABT techniques, our system can

  • effectively distinguish between normal and abnormal traffic patterns, reducing false positives and negatives.

  • The proposed system has improved computational efficiency, suggesting its low prediction time of 6.67 \mu \mathrm{s}. This faster processing time enables real-time detection and response to cyber threats in connected and autonomous vehicles and improves the latency and system responsiveness concerns observed in previous IDS.

  • The proposed system is designed to be adaptable and scalable, allowing seamless integration into various vehicular communication systems. This flexibility allows our IDS to accommodate evolving cyber threats and technological advancements, addressing the limitations of system compatibility and deployment scalability observed in previous IDS.

4.1 Results Discussion

In this subsection, we clarify the evaluation method for delays and provide a more detailed justification of the comparisons with benchmark methods to provide a more comprehensive understanding of our evaluation and comparison methods.

Evaluation of delays: We measured the classification overhead (delays) using a standardized experimental setup. The experiments were conducted on an Intel i7 processor, 16GB RAM, running Windows 11 and MATLAB 2021b. Each model's classification time was recorded over multiple runs to ensure consistency, and the average time was reported. The classification overhead was measured as the time taken to process a single CAN message from input to output classification.

Justification of benchmark comparisons: We compared our results against several reported methods, such as models reported in Ref. [28] (CNN model), Ref. [26] (GAN networks), Ref. [19] (ResNet50 Transfer Learning), Ref. [35] (CANTransfer), Ref. [21] (CANinyelliIDS), Ref. [36] (ANN-IDS), and Ref. [37] (Bayesian network). These benchmarks were chosen due to their relevance and prominence in vehicular intrusion detection systems. To ensure a fair comparison, we ensured that the conditions and metrics used in our study closely matched those in the benchmark studies.

Experiential outcome: The results showed that our proposed system, particularly the Adaboost trees model, outperformed the state-of-the-art models in terms of classification accuracy and delay. We attribute this improvement to several factors: the Adaboost trees model's ensemble approach enhances robustness and accuracy by combining multiple weak classifiers, effectively handling diverse attack patterns. The comprehensive preprocessing of the Car-Hacking-2018 dataset, including data balancing and feature extraction, ensured effective learning. Our thorough feature selection process identified the most relevant features, improving the model's detection capabilities. Additionally, optimized hyperparameters and efficient implementation facilitated real-time processing, maintaining network security without compromising performance. These factors collectively contributed to the superior performance of our proposed system.

SECTION 5

Conclusion and Future Work

Intelligent vehicles enable several communication protocols to exchange information within their internal and external environments. A CAN is a well-known communication protocol used heavily with Intelligent vehicles. However, using CAN as a communication interface leads to an increase in the potential risk of conducting cyberattacks that import negative consequences to autonomous vehicles. This paper introduced an expedited detection technique to secure vehicular communications targeted by cyberattacks through the CAN protocol. The proposed system studies the performance of four independent ML models: ABT, CDT, NBC, and SVM. The four ML models were tested and evaluated using the car-hacking-2018 dataset. The output of each ML model is either a normal, fuzzy attack, DoS attack, RPM attack (spoofing), or gear attack (spoofing). The experimental result showed that ABT performed better than CDT, NBC, and SVM. The ABT scored 99.8% classification accuracy and 6.67\ \mu\mathrm{s} of classification overhead.

In future work, we recommend studying the impacts of cyberattacks other than those used in this study, like zero-day vulnerabilities and Advanced Persistent Threats (APTs). In addition, using a Federated Machine Learning (FML) approach enhances the security of vehicular communications since it allows for distributed data analysis rather than centralized data processing. Therefore, this increases privacy, reduces security risks, and increases the system's sustainability since centralized data processing is a singular point of failure. Also, to address the concern of increasing network traffic volumes, we propose using incremental learning (as an extension of this model at the deployment), where models are trained incrementally over time, often on new or additional data, without requiring retraining from scratch. This approach is particularly useful in dynamic environments (such as diverse vehicular communication environments) where data distribution may change over time or new data becomes available continuously. Finally, this study can be extended to discuss adopting the fuzzy decision-making model and the metaverse integration alternatives of CAVs for anomalous traffic detection systems for secure vehicular communications[38], [39].

References

References is not available for this document.