Conferences >2018 IEEE 23rd Pacific Rim In...

Do Nothing, But Carefully: Fault Tolerance with Timing Guarantees for Multiprocessor Systems Devoid of Online Adaptation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Many practical real-time systems must be able to sustain several reliability threats induced by their physical environments that cause short-term abnormal system behavior...Show More

Metadata

Abstract:

Many practical real-time systems must be able to sustain several reliability threats induced by their physical environments that cause short-term abnormal system behavior, such as transient faults. To cope with this change of system behavior, online adaptions, which may introduce a high computation overhead, are performed in many cases to ensure the timeliness of the more important tasks while no guarantees are provided for the less important tasks. In this work, we propose a system model which does not require any online adaption, but, according to the concept of dynamic real-time guarantees, provides full timing guarantees as well as limited timing guarantees, depending on the system behavior. For the normal system behavior, timeliness is guaranteed for all tasks; otherwise, timeliness is guaranteed only for the more important tasks while bounded tardiness is ensured for the less important tasks. Aiming to provide such dynamic timing guarantees, we propose a suitable system model and discuss, how this can be established by means of partitioned as well as semi-partitioned strategies. Moreover, we propose an approach for handling abnormal behavior with a longer duration, such as intermittent faults or overheating of processors, by performing task migration in order to compensate the affected system component and to increase the system's reliability. We show by comprehensive experiments that good acceptance ratios can be achieved under partitioned scheduling, which can be further improved under semi-partitioned strategies. In addition, we demonstrate that the proposed migration techniques lead to a reasonable trade-off between the decrease in schedulability and the gain in robustness of the system. The presented approaches can also be applied to mixed-criticality systems with two criticality levels.

Published in: 2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC)

Date of Conference: 04-07 December 2018

Date Added to IEEE Xplore: 14 February 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/PRDC.2018.00010

Conference Location: Taipei, Taiwan

Contents

I. Introduction

Undeniably, the majority of practical real-time systems must be able to sustain several reliability threats, especially if the system is safety-critical and hard real-time characteristics must be satisfied, as prevalent in the automotive and aerospace sector. More precisely, the proper system functioning must be maintained at any point in time, comprising not only functional but also temporal correctness, i.e., a delivered result must be correct and, moreover, be obtained previous to a specified deadline. In order to ensure these properties, manifold hardware as well as software techniques have been developed so far by means of which such systems' reliability can be increased when so-called soft errors or transient faults occur, e.g., spatial isolation of certain components, hardware redundancy, remapping of logical system functionalities onto a subset of hardware resources, monitoring, and re-execution of erroneous software jobs [16]. However, these strategies are not necessarily sufficient or even applicable in all cases, since i) not every technique is fruitful with respect to each type of faults, and ii) online adaption performed in the course of fault-recovery may lead to uncertain execution behavior.

References is not available for this document.

MIT Libraries

MIT Libraries

Do Nothing, But Carefully: Fault Tolerance with Timing Guarantees for Multiprocessor Systems Devoid of Online Adaptation

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Do Nothing, But Carefully: Fault Tolerance with Timing Guarantees for Multiprocessor Systems Devoid of Online Adaptation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References