Loading [MathJax]/extensions/MathMenu.js
An Architecture for Runtime State Restoration after Transient Hardware-Faults in Redundant Real-Time Systems | IEEE Conference Publication | IEEE Xplore

An Architecture for Runtime State Restoration after Transient Hardware-Faults in Redundant Real-Time Systems


Abstract:

Employing programmable electronic systems (PESs) in safety-critical real-time applications that cannot immediately be transferred to safe states requires especially high ...Show More

Abstract:

Employing programmable electronic systems (PESs) in safety-critical real-time applications that cannot immediately be transferred to safe states requires especially high degrees of fault-tolerance. Conventionally, this demand is satisfied not only by configuring multiple PESs redundantly, but also by applying redundant processing structures inside each PES. Instead, it is also desirable to provide the capability to rehabilitate a PES's faulty state by copying the internal state from its redundant counterparts at runtime. Thus, redundancy attrition due to transient faults is prevented, since failed channels can be brought back on line. Here, the problems concerned with state restoration at runtime are stated, the advantages and disadvantages of existing techniques are discussed, and a hardware-supported concept is introduced
Date of Conference: 20-22 September 2006
Date Added to IEEE Xplore: 07 May 2007
Print ISBN:0-7803-9758-4

ISSN Information:

Conference Location: Prague, Czech Republic

1. Introduction

One of the most essential performance features of safety-related Programmable Electronic Systems (PESs) is ‘Availability’, i.e. the probability that a system is, at a predefined point in time, in an error-free state. In applications requiring safety licensing in accordance with the safety standard IEC 61508, increasing availability solely by minimising the failure rate of the built-in components is not sufficient. Since hardware failures are not totally avoidable-only their probability can be minimised -, it is also necessary to apply fault-tolerance techniques that ensure continuation of operation in case of component failures. Almost all fault-tolerance techniques base on the principle of redundancy, i.e. on the multiple existence of functionally or characteristically similar objects.

Contact IEEE to Subscribe

References

References is not available for this document.