Loading [MathJax]/extensions/MathZoom.js
Investigating novel machine learning based intrusion detection models for NSL-KDD data sets | IEEE Conference Publication | IEEE Xplore

Investigating novel machine learning based intrusion detection models for NSL-KDD data sets


Abstract:

This study investigates the application of the Mutual Information (MI) feature selection technique to improve the accuracy of Machine Learning (ML) models on NSL-KDD data...Show More

Abstract:

This study investigates the application of the Mutual Information (MI) feature selection technique to improve the accuracy of Machine Learning (ML) models on NSL-KDD datasets, building upon prior research. Six ML models, namely Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbor (KNN), Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM) with different kernels (1st, 2nd, and 3rd), are implemented for classification purposes. The proposed DT model in this study shows higher accuracy than the DT model proposed in the original paper by Ingre et al. for Intrusion Detection System (IDS). Additionally, a multi-class classification model for NSL-KDD datasets is developed, considering both normalized and non-normalized features. Interestingly, it is observed that the models trained without normalized features achieve higher accuracies compared to those trained with normalized features. Moreover, the study enhances the classification performance of the DT-based IDS using the Correlation based Feature Selection (CFS) technique for feature selection. The proposed IDS is evaluated both before and after feature selection for multi-class classification (normal and various attack types) and binary classification (normal and abnormal data).
Date of Conference: 09-10 October 2023
Date Added to IEEE Xplore: 11 December 2023
ISBN Information:
Conference Location: Chiniot, Pakistan

I. Introduction

In the domain of cybersecurity, the significance of intrusion detection cannot be overstated, as it plays a critical role in identifying and preventing unauthorized access and malicious activities targeting computer systems and networks. The core objective of intrusion detection is to ensure the confidentiality, integrity, and availability of sensitive data and resources. To safeguard computer networks and systems from unauthorized access, security professionals employ Intrusion Detection Systems (IDS) for continuous monitoring [1]. These IDS detect and promptly alert potential threats, enabling quick and effective responses to security incidents and thereby mitigating the risk of data breaches and other security compromises [2]. Moreover, intrusion detection aids organizations in complying with data protection and privacy regulations. In summary, intrusion detection stands as a vital tool in upholding the security of computer systems and networks and combatting the escalating menace of cyber attacks. The use of machine learning (ML) models in intrusion detection has become increasingly significant, primarily due to their capacity to analyze vast and intricate datasets in real-time, effectively detect unknown threats, and adapt to evolving attack patterns [3]. Unlike traditional rule-based and signature-based IDS that struggle with advanced and unfamiliar attacks, ML models can learn from data, recognize patterns, and detect anomalies that signify malicious behavior. Several ML models have been used for intrusion detection tasks such as classification, clustering, and anomaly detection. These ML models use network traffic analysis, system logs, user behavior or other characteristics to identify potential attacks or suspicious activities, are able to categorize attacks into different types based on their characteristics, and offer valuable insights for determining appropriate response measures or mitigation strategies. Techniques such as feature selection, parameter tuning, and ensemble methods can be used to improve accuracy and performance of these models. Feature selection techniques select the most relevant features for the model, while parameter tuning generally optimizes the model’s performance. Ensemble methods typically combine multiple models to enhance accuracy and minimize false positives. Blockchain-based systems are one of the innovations used for this purpose [4].

Contact IEEE to Subscribe

References

References is not available for this document.