1 Introduction
Cloud computing is becoming increasingly pervasive, with the extensive demand for big data analytics and discovery shifting many individuals and organizations towards cloud services. This move is motivated by benefits such as shared storage and computation service among a massive number of users. In order to maximally leverage the cloud, high availability and reliability are of utmost importance to the overall user experience. Therefore, it is important to monitor the compute nodes' usage and behavior, and then gain insights into the potential anomalous operations running in the cloud which might result in reduced efficiency or even downtime of the data center.
CloudDet facilitates the exploration of anomalous cloud computing performances through three levels of analysis: (a) Anomaly ranking, (b) Anomaly inspection, and (c) Anomaly clustering. The figure showcases some exploration results with bitbrains datacenter traces data. Node (b1) contains both short and long term spikes. Node (b2) shows a 12-hour periodic pattern for the performance metrics by observing the calendar chart in (a2), but encounters a spike in the process. Node (b3) shows many short-term and near-periodic spikes at the beginning and an abnormal long-term spike near the end. After collapsing the long-term one into a visual aggregation glyph in (b5), (b3) is updated and the latter temporal data “pop out”, which shows a similar pattern as the beginning. Node (b4) shows a general periodic trend by using the PCA analysis in (b6). Most of the nodes are clustered into three groups in (c).