Loading [MathJax]/extensions/MathZoom.js
CloudDet: Interactive Visual Analysis of Anomalous Performances in Cloud Computing Systems | IEEE Journals & Magazine | IEEE Xplore

CloudDet: Interactive Visual Analysis of Anomalous Performances in Cloud Computing Systems


Abstract:

Detecting and analyzing potential anomalous performances in cloud computing systems is essential for avoiding losses to customers and ensuring the efficient operation of ...Show More

Abstract:

Detecting and analyzing potential anomalous performances in cloud computing systems is essential for avoiding losses to customers and ensuring the efficient operation of the systems. To this end, a variety of automated techniques have been developed to identify anomalies in cloud computing. These techniques are usually adopted to track the performance metrics of the system (e.g., CPU, memory, and disk I/O), represented by a multivariate time series. However, given the complex characteristics of cloud computing data, the effectiveness of these automated methods is affected. Thus, substantial human judgment on the automated analysis results is required for anomaly interpretation. In this paper, we present a unified visual analytics system named CloudDet to interactively detect, inspect, and diagnose anomalies in cloud computing systems. A novel unsupervised anomaly detection algorithm is developed to identify anomalies based on the specific temporal patterns of the given metrics data (e.g., the periodic pattern). Rich visualization and interaction designs are used to help understand the anomalies in the spatial and temporal context. We demonstrate the effectiveness of CloudDet through a quantitative evaluation, two case studies with real-world data, and interviews with domain experts.
Published in: IEEE Transactions on Visualization and Computer Graphics ( Volume: 26, Issue: 1, January 2020)
Page(s): 1107 - 1117
Date of Publication: 20 August 2019

ISSN Information:

PubMed ID: 31442994

Funding Agency:


1 Introduction

Cloud computing is becoming increasingly pervasive, with the extensive demand for big data analytics and discovery shifting many individuals and organizations towards cloud services. This move is motivated by benefits such as shared storage and computation service among a massive number of users. In order to maximally leverage the cloud, high availability and reliability are of utmost importance to the overall user experience. Therefore, it is important to monitor the compute nodes' usage and behavior, and then gain insights into the potential anomalous operations running in the cloud which might result in reduced efficiency or even downtime of the data center.

CloudDet facilitates the exploration of anomalous cloud computing performances through three levels of analysis: (a) Anomaly ranking, (b) Anomaly inspection, and (c) Anomaly clustering. The figure showcases some exploration results with bitbrains datacenter traces data. Node (b1) contains both short and long term spikes. Node (b2) shows a 12-hour periodic pattern for the performance metrics by observing the calendar chart in (a2), but encounters a spike in the process. Node (b3) shows many short-term and near-periodic spikes at the beginning and an abnormal long-term spike near the end. After collapsing the long-term one into a visual aggregation glyph in (b5), (b3) is updated and the latter temporal data “pop out”, which shows a similar pattern as the beginning. Node (b4) shows a general periodic trend by using the PCA analysis in (b6). Most of the nodes are clustered into three groups in (c).

Contact IEEE to Subscribe

References

References is not available for this document.