Abstract:
Detailed system operations are recorded in logs. To ensure system reliability, developers can detect system anomalies through log anomaly detection. Log parsing, which co...Show MoreMetadata
Abstract:
Detailed system operations are recorded in logs. To ensure system reliability, developers can detect system anomalies through log anomaly detection. Log parsing, which converts semi-structured log messages into structured data, is a crucial step in log anomaly detection and advanced program analysis and verification. Despite the availability of various log parsing tools, they generally suffer from low parsing accuracy and slow efficiency due to the ignorance of variable characteristics and the use of costly pairwise comparison methods. In this paper, we propose a TCMS framework to parse logs, consisting of two main technologies. First, by studying 16 public log datasets, we find that most log variable tokens are structured variable tokens. Based on this discovery, we propose a token conversion algorithm to improve parsing accuracy. This algorithm converts the changed parts in structured variable tokens into wildcards ('< ∗ >'), preventing these tokens from being directly identified as constant tokens. Second, to improve efficiency, we propose the LogMLCS algorithm, which intelligently constructs a graph to facilitate the extraction of common parts from multiple log messages at once, instead of using pairwise comparisons. Comprehensive experiments conducted on 16 log datasets reveal that our TCMS outperforms seven other parsing methods, achieving the highest parsing accuracy at the fastest speed. Furthermore, experimental results from running a log anomaly detection algorithm in conjunction with different log parsing methods demonstrate that TCMS significantly boosts detection accuracy. For instance, on the OpenStack dataset, our TCMS-facilitated log anomaly detection algorithm achieves a perfect F1-score, precision, and recall of 100% each, surpassing the best peer method by 32.2, 0.8, and 19.5 percentage points, respectively.
Published in: IEEE Transactions on Dependable and Secure Computing ( Early Access )