Conferences >2023 30th Asia-Pacific Softwa...

FSFP: A Fine-Grained Online Service System Performance Fault Prediction Method Based on Cross-attention

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

An online service system may experience various performance faults during operation. Detecting and locating these faults after they occur can significantly impact the use...Show More

Metadata

Abstract:

An online service system may experience various performance faults during operation. Detecting and locating these faults after they occur can significantly impact the user experience and lead to significant losses. Therefore, it is necessary to predict faults before they occur. Existing methods for fault prediction typically only predict the possibility of fault, without providing more granular predictions, such as the type of fault. This can make troubleshooting more difficult for developers. In this paper, we propose a fine-grained fault prediction method called FSFP, which not only predicts the possibility of fault but also identifies the type of fault that may occur. The method initially collects performance monitoring metrics from the runtime system, including two types: normal operation and abnormal conditions. It then utilizes cross-attention to capture the interdependencies between these two types of monitoring metrics, followed by the construction of a multi-label classification model. We evaluated FSFP by injecting faults into a benchmark microservice system. In terms of predicting the possibility of fault, FSFP achieved a precision of 0.999, a recall of 0.998, and an F1 score of 0.999. In terms of predicting the type of fault, FSFP achieved an exact match ratio of 0.955 and a Hamming loss of 0.017. In terms of predicting six specific types of faults, FSFP achieved four optimal F1 scores.

Published in: 2023 30th Asia-Pacific Software Engineering Conference (APSEC)

Date of Conference: 04-07 December 2023

Date Added to IEEE Xplore: 02 April 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/APSEC60848.2023.00018

Conference Location: Seoul, Korea, Republic of

Funding Agency:

Contents

I. Introduction

Various software systems serve our daily work in different aspects of life. However, during the operation of these systems, performance faults such as slow response times are inevitable. Once these faults occur, they can significantly impact the system's availability and reliability, resulting in financial losses. For example, according to a recent survey [1],the average cost per hour of server downtime is between $301,000 and$ 400,000. To minimize the losses caused by performance faults, remediation after the occurrence of a fault is one approach [2]–[7]. However, predicting and identifying potential risks before the faults happen and taking preventive measures can directly prevent service unavailability. Therefore, many engineers have conducted research in this area.

References is not available for this document.

MIT Libraries

MIT Libraries

FSFP: A Fine-Grained Online Service System Performance Fault Prediction Method Based on Cross-attention

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

FSFP: A Fine-Grained Online Service System Performance Fault Prediction Method Based on Cross-attention

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?