Loading [MathJax]/extensions/MathZoom.js
AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment | IEEE Conference Publication | IEEE Xplore

AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment


Abstract:

Monitoring Key Performance Indicators (KPIs) and detecting anomalies in online service systems is critical. However, choosing the right KPI anomaly detection algorithm an...Show More

Abstract:

Monitoring Key Performance Indicators (KPIs) and detecting anomalies in online service systems is critical. However, choosing the right KPI anomaly detection algorithm and appropriate hyperparameters presents a challenge. Conventional Automated Machine Learning (AutoML) struggles to address this because the hold-out dataset lacks labels and its loss doesn’t reliably reflect anomaly detection accuracy. To address the above challenges, this paper introduces AutoKAD, an AutoML framework designed to solve the combined algorithm selection and hyperparameter optimization problem for unsupervised KPI Anomaly Detection. We propose a label-free universal objective function, inspired by the Local Outlier Factor (LOF), for evaluating AutoML trials. Additionally, we improve the acquisition function and designs a cluster-based warm start strategy to enhance exploration effectiveness and efficiency. The experimental results on three real-world datasets show that our approach outperforms the SOTA model selection algorithm by 11% in F1-score and achieves comparable performance (99%) with theoretically optimal results. We believe that AutoKAD can greatly improve the deployment feasibility of existing anomaly detection algorithms in real-world systems. Our code is anonymously released at https://github.com/NetManAIOps/AutoKAD.
Date of Conference: 09-12 October 2023
Date Added to IEEE Xplore: 02 November 2023
ISBN Information:

ISSN Information:

Conference Location: Florence, Italy

Funding Agency:


I. Introduction

In today’s digital world, online service systems, such as search engines, e-commerce platforms, and social networks, have become an integral part of our daily lives. To ensure seamless service and maintain user satisfaction, IT operations engineers in these companies closely monitor Key Performance Indicators (KPIs) such as response time and success rate, providing a comprehensive overview of the system’s performance. KPI anomaly detection (KAD) plays a crucial role in identifying potential issues by detecting anomalies in KPIs, thereby accelerating the process of failure diagnosis and mitigation.

Contact IEEE to Subscribe

References

References is not available for this document.