Abstract:
Data Imbalance is a prominent as well as challenging problem in the real-world datasets. Even though there are many methods to rectify this problem using supervised algor...View moreMetadata
Abstract:
Data Imbalance is a prominent as well as challenging problem in the real-world datasets. Even though there are many methods to rectify this problem using supervised algorithms, data imbalance issue in an unsupervised manner is still under research. In this study two auto encoders - a sparse auto encoder (SAE) and convolution neural network bi-directional long short-term memory (CNN Bi-LSTM) auto encoder are proposed to handle the data imbalance issue by unsupervised anomaly detection technique. Entropy Weight Method (EWM) is used for feature weighting and feature reduction purpose. Four open-source datasets are considered for the study. The SAE network performed well with entropy weighted feature vectors of each dataset except for the time series dataset. On the other hand, the CNN Bi-LSTM network worked well for the time series dataset. The f1-score of each dataset with varied percentage of features reduced by EWM is studied. From the results it is also evident that considering 75% and above features could give good f1-score of 90% and higher. Thus, this unsupervised method of rectifying the data imbalance issue is an effective one. More generalization to the networks can be done so that one network can give enhanced performance to all the datasets available.
Published in: 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC)
Date of Conference: 26-28 May 2023
Date Added to IEEE Xplore: 14 July 2023
ISBN Information: