Conferences >2023 China Automation Congres...

An Optimization of SMOTE for Anomaly Detection Based on High Contribution Sample Screening

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Detecting anomalies with high accuracy for data with imbalanced class distribution is an important goal for many real-world applications. The imbalance between the normal...Show More

Metadata

Abstract:

Detecting anomalies with high accuracy for data with imbalanced class distribution is an important goal for many real-world applications. The imbalance between the normal and the abnormal classes is the main challenge of anomaly detection, which is traditionally addressed by the Synthetic Minority Over-sampling Techniques (SMOTE). In this paper, a new High-Contribution SMOTE is proposed to overcome the original SMOTE's limitation of not considering the high contribution from the boundary data for anomaly detection. First, the set of high-contribution samples is screened with a classifier constructed by the generated Receiver Operating Characteristic (ROC) curves, consisting of some minority class samples that are prone to misclassification. Then, these high-contribution samples are used to synthesize new minority class samples, different from the randomly selected samples in the original SMOTE. This new idea enhances the contribution of the boundary data and obtains a data distribution closer to the true picture, which is expected to significantly improve the performance of anomaly detection. Finally, comparison experiments with the Original SMOTE and the Borderline SMOTE are carried out, and the results show that the High-Contribution SMOTE proposed in this paper obtains the higher Recall Score and

$\boldsymbol{F}_{\mathbf{1}}$ Score.

Published in: 2023 China Automation Congress (CAC)

Date of Conference: 17-19 November 2023

Date Added to IEEE Xplore: 19 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/CAC59555.2023.10451412

Conference Location: Chongqing, China

Funding Agency:

Contents

I. Introduction

Among the existing anomaly detection methods, data-driven methods have become a mainstream means. Of these, supervised learning methods, in which all types of data in the training set are labeled, can lead to significant improvements in the accuracy of anomaly detection. Therefore, supervised learning classification algorithms have been widely used in the field of anomaly detection. However, in many real-world detection problems, the amount of anomalous data is often much less than normal data, which leads to the problem of data imbalance, i.e., the existence of majority and minority classes in the dataset with significantly different sample sizes [1]. In the classification domain based on intelligent algorithms, modeling imbalanced data sets is a frequent challenge in training models. If the imbalanced data between different classes are modeled directly, the features selected by the traditional feature selection method tend to be more inclined to the majority classes than that of the minority classes, and the trained model will overfit the majority of classes, leading to the biased prediction in the model.

References is not available for this document.

An Optimization of SMOTE for Anomaly Detection Based on High Contribution Sample Screening

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

An Optimization of SMOTE for Anomaly Detection Based on High Contribution Sample Screening

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References