

Received October 3, 2018, accepted October 26, 2018, date of publication November 12, 2018, date of current version December 7, 2018.

Digital Object Identifier 10.1109/ACCESS.2018.2880763

# A Self-Healing Redundancy Scheme for Mission/Safety-Critical Applications

# P. BALASUBRAMANIAN<sup>®</sup>, (Senior Member, IEEE), AND DOUGLAS L. MASKELL, (Senior Member, IEEE)

School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798

Corresponding author: P. Balasubramanian (balasubramanian@ntu.edu.sg)

This work was supported by the Academic Research Fund Tier-2 Research Award of the Ministry of Education, Republic of Singapore, under Grant MOE2017-T2-1-002.

**ABSTRACT** In the nanoelectronics era, multiple faults or failures in circuits and systems deployed in mission- and safety-critical applications, such as space, aerospace, nuclear etc., are known to occur. To withstand these, higher order redundancy is suggested to be used selectively in the sensitive portions of a circuit or system. In this context, the distributed minority and majority voting based redundancy (DMMR) scheme was proposed as an alternative to the N-modular redundancy (NMR) scheme for the efficient implementation of higher order redundancy. However, the DMMR scheme is not self-healing. In this paper, we present a new self-healing redundancy (SHR) scheme that can inherently correct its internal faults or failures without any external intervention, which makes it ideal for mission/safety-critical applications. To achieve the same degree of fault tolerance, the SHR scheme requires fewer function blocks than the NMR and DMMR schemes. We present the architectures of the proposed SHR scheme, discuss the system reliability, and provide the design metrics estimated for example SHR systems alongside the corresponding NMR and DMMR systems using a 32/28-nm CMOS technology. From the perspectives of fault tolerance, self-healing capability, and optimizations in the design metrics, the SHR scheme is preferable to the NMR and DMMR schemes.

**INDEX TERMS** Fault tolerance, redundancy, digital circuits, combinational circuits, CMOS technology.

### I. INTRODUCTION

In the modern electronics era, circuits and systems used in mission- and safety-critical applications such as space, aerospace, nuclear, etc. are known to be increasingly susceptible to multiple faults or failures due to the impact of radiation on small device geometries [1]–[6] and/or other phenomena such as aging [6], [7]. To cope with these, as a potential solution at the architecture level, higher order redundancy such as 5-modular redundancy (5MR), 7-modular redundancy (7MR), 9-modular redundancy (9MR) etc. are suggested to be used selectively [8] in the sensitive or critical portions of a mission- or safety-critical circuit or system to achieve greater fault tolerance. The 5MR, 7MR, and 9MR represent the 5-tuple, 7-tuple, and 9-tuple versions of the N-modular redundancy (NMR) respectively.

In the well-known and established NMR scheme [9], [10], shown in Fig. 1a, a majority, i.e., (N + 1)/2 function blocks out of the N identical function blocks, are required to operate correctly, where N is odd. Here, the term 'function block'

is used to refer to a circuit or a system. All the function blocks are supplied with identical inputs. The corresponding outputs of the N function blocks viz.  $F_1$  to  $F_N$  are given to a voter, which determines the correct output of the NMR system through majority voting.

In the DMMR scheme [11], which is an alternative of the NMR scheme, depicted by Fig. 1b, M identical function blocks are used which are split into two groups as Group 1 and Group 2, marked in Fig. 1b. The DMMR voter consists of the AO222 gate, a (M–3)-input OR gate, and a 2-input AND gate corresponding to each primary output of the function blocks. Group 1 comprises 3 function blocks, and Group 2 comprises (M–3) function blocks. In this context, Fig. 1b is said to portray the 3-of-M DMMR scheme. The outputs of function blocks 1, 2 and 3 i.e., D<sub>1</sub>, D<sub>2</sub> and D<sub>3</sub> are given to a majority gate (i.e., AO222 gate) whose output is marked as G<sub>1</sub>. Since majority voting is performed on the outputs of function blocks 1, 2, and 3, it is required that at least 2 out of the 3 function blocks constituting Group 1 should operate correctly.



FIGURE 1. Block diagrams of (a) NMR system architecture, and (b) (3-of-M) DMMR system architecture.

The DMMR scheme inherently accords higher priority to the outputs of Group 1 compared to the outputs of Group 2. The faulty state or the failure of Group 1 due to any common mode faults or failures of its function blocks could affect the DMMR system operation, as the output of Group 1, which is governed by the Boolean majority, serves as the reference output. This is the same case with an NMR system wherein a violation of the Boolean majority condition of its function blocks could affect the NMR system operation.

In Fig. 1b, the outputs of function blocks 4 to M are given to a (M-3)-input OR gate, which may be arbitrarily decomposed, and its output is marked as G<sub>2</sub>. In general, at least 1 out of the (M-3) function blocks comprising Group 2 should operate correctly although there may be an exception depending on the inputs. For example, in Fig. 1b, if  $G_1$  is 0, regardless of the value of  $G_2$ , the output of the DMMR system could evaluate to 0 since the intermediate outputs  $G_1$  and  $G_2$  are AND-ed together to produce the DMMR system output. Let us consider another scenario. Supposing 2 out of the 3 function blocks in Group 1 output 1,  $G_1 = 1$ ; and if all the function blocks in Group 2 have failed,  $G_2 = 0$ . Under this condition, the DMMR system would produce an output of 0, which is erroneous. This is because in a DMMR system, the output(s) of Group 1 serve as the reference for the output of the DMMR system. Although this scenario is unwarranted since at least one of the function blocks in Group 2 is generally expected to operate correctly, nevertheless this example shows that when none of the function blocks

in Group 2 operate correctly, which represents the worst-case, there is a possibility to produce an erroneous output, and the DMMR system cannot self-heal. Moreover, there would be no indication to the outside world that the DMMR system is erroneous under this scenario.

A 3-of-5 DMMR system can mask the faults or failures of maximum of 2 function blocks. The 3-of-6 and 3-of-7 DMMR systems can mask the faults or failures of maximum of 3 and 4 function blocks respectively. With respect to the NMR scheme, the 5MR, 7MR, and 9MR systems can mask the faulty or failure states of up to 2, 3, and 4 function blocks respectively. Based on the fault tolerance, the 3-of-5, 3-of-6, and 3-of-7 DMMR systems form the respective redundant counterparts of the 5MR, 7MR, and 9MR systems [11].

In the remainder of this paper, Section 2 presents the new self-healing redundancy (SHR) scheme by describing the system architectures and compares the system reliabilities of NMR, DMMR and SHR schemes. Example implementations corresponding to the NMR, DMMR, and SHR schemes are discussed in Section 3. Finally, the conclusions are drawn in Section 4.

## II. SHR – SYSTEM ARCHITECTURES, OPERATION, AND RELIABILITY

#### A. SHR SYSTEM ARCHITECTURES

Two SHR system architectures are presented. One is the 2-of-4 SHR system architecture shown in Fig. 2a, and the other is the generic 2-of-M SHR system architecture shown in Fig. 2b, where M > 4. In a 2-of-4 SHR system, 4 identical function blocks are used, which are partitioned into two groups as Group 1 and Group 2, as shown in Fig. 2a. Group 1 consists of 3 function blocks and Group 2 consists of 1 function block. If the function blocks have multiple outputs, say K outputs each, there would be K implementations of the SHR voter shown in Figs. 2a and 2b. Also, the selfhealing circuit, shown in Figs. 2a and 2b, would feature (K-1) copies. As a minimum, the correct operation of at least 2 out of the 3 function blocks in Group 1, which is governed by the Boolean majority, is essential to guarantee the correct operation of the 2-of-4 SHR system, and this represents the worst-case scenario for the 2-of-4 SHR system.

On the other hand, in a generic 2-of-M SHR system where M > 4, the M identical function blocks are also partitioned into two groups as Group 1 and Group 2, as shown in Fig. 2b. In a generic 2-of-M SHR system, the value of M determines the degree of fault tolerance. Again, the correct operation of at least 2 out of the 3 function blocks in Group 1 is mandatory, which represents the worst-case scenario for a 2-of-M SHR system. The SHR scheme is more relaxed and accommodative than the counterpart NMR and DMMR schemes with respect to the fault tolerance. A 2-of-4 SHR system can tolerate the faults or failures of 2 function blocks. To the best of our knowledge, in the existing literature, there is no redundant system architecture that can mask

69642

the faults or failures of 2 function blocks while using only 4 identical function blocks. A 2-of-M SHR system can withstand the faults or failures of maximum of (M-2) function blocks. This degree of fault tolerance achievable by a 2-of-M SHR system is significant given that an NMR system can tolerate the faults or failures of maximum of (N-1)/2 function blocks, and a 3-of-M DMMR system can tolerate the faults or failures of up to (M-3) function blocks. Hence, the fault tolerance of the SHR scheme is superior to the NMR and DMMR schemes.

The 2-of-4 SHR, 5MR, and 3-of-5 DMMR systems have maximum fault tolerance of 2 function blocks. The 2-of-5 SHR, 7MR, and 3-of-6 DMMR systems feature maximum fault tolerance of 3 function blocks. The 2-of-6 SHR, 9MR, and 3-of-7 DMMR systems have maximum fault tolerance of 4 function blocks. These imply that the SHR architecture requires fewer function blocks than the NMR and DMMR architectures to achieve the same degree of fault tolerance.

#### **B. SHR SYSTEMS OPERATION**

The 2-of-4 and 2-of-M SHR systems shown in Figs. 2a and 2b have 4 parts which are shown enclosed within the dotted boxes: i) Group 1 comprising the function blocks 1, 2, and 3, ii) Group 2 comprising the function block 4 in Fig. 2a and the function blocks 4 to M in Fig. 2b, iii) the SHR voter that produces the output MAJ in Fig. 2a, and the dual outputs which are labelled MAJ and MIN in Fig. 2b, corresponding to Group 1 and Group 2 respectively, and iv) the self-healing circuit. The self-healing circuit incorporates redundant logic [12], and this helps to improve the fault tolerance of the SHR systems. The outputs of the function blocks are represented by B1 to B4 in Fig. 2a, and by B1 to BM in Fig. 2b. I<sub>1</sub> and I<sub>2</sub> are the internal voter outputs in Fig. 2b. The SHR voter of the 2-of-4 SHR system, shown in Fig. 2a, is like the majority voter of the 3MR system. The output of the SHR systems is specified as SSO in Fig. 2.

The logic expressions defining the internal and primary outputs of the 2-of-4 and 2-of-M SHR systems shown in Fig. 2 are given below. Equations (1) and (2) correspond to the 2-of-4 SHR system shown in Fig. 2a, and (1), (3), (4), (5) and (6) correspond to the generic 2-of-M SHR system shown in Fig. 2b. In the equations, the symbol ' represents the logical inversion, the product implies logical conjunction, and the sum implies logical disjunction.

$$MAJ = B_1B_2 + B_2B_3 + B_1B_3$$
(1)

$$SSO_{2-of-4} = (MAJ)(B_4) + (MAJ)(B'_4)$$
 (2)

 $I_1 = B_4 B_5 \dots B_M \tag{3}$  $I_2 = B_4 + B_5 + \dots + B_M \tag{4}$ 

$$I_2 = B_4 + B_5 + \ldots + B_M$$
 (4)

$$MIN = (I_1)(MAJ') + (I_2)(MAJ)$$
(5)

$$SSO_{2-of-M} = (MAJ)(MIN) + (MAJ)(MIN')$$
(6)

Tables 1 and 2 illustrate the operation of the 2-of-4 and 2-of-M SHR systems respectively. These tables capture the

![](_page_3_Picture_1.jpeg)

![](_page_3_Figure_2.jpeg)

![](_page_3_Figure_3.jpeg)

**FIGURE 2.** Block diagrams of (a) 2-of-4 SHR system architecture, and (b) 2-of-M SHR system architecture, where M > 4. The AND and OR gates shown in the SHR voter part in Fig. 2b can be arbitrarily decomposed and hence they are synthesizable. In Fig. 2b, the intermediate output MAJ acts as the select input for the 2:1 MUX present in the SHR voter. Based on the value of MAJ, I<sub>1</sub> or I<sub>2</sub> is selected and its value is forwarded to the output MIN.

distinct input combinations corresponding to Group 1, and a representative set of the distinct input combinations corresponding to Group 2 to describe the normal and self-healing operations of the SHR systems. Tables 1 and 2 illustrate 5 possible scenarios with respect to the SHR systems shown in Figs. 2a and 2b. These scenarios are also applicable to SHR systems which might consist of function blocks with multiple outputs.

Tables 1 and 2 consider the non-faulty and faulty (but maskable) states of the function blocks comprising Group 1 and Group 2, and the complete failure of Group 2. The complete failure of Group 1 is neglected. This is because

| TABLE 1. I | llustrating t | ne 2-of-4 SHR | system | operation. |
|------------|---------------|---------------|--------|------------|
|------------|---------------|---------------|--------|------------|

| 0                                                     | Group 1                                               |                | Group 2        | Voter Output | System Output |  |  |  |
|-------------------------------------------------------|-------------------------------------------------------|----------------|----------------|--------------|---------------|--|--|--|
| $\mathbf{B}_1$                                        | $\mathbf{B}_2$                                        | $\mathbf{B}_3$ | $\mathbf{B}_4$ | MAJ          | SSO           |  |  |  |
| Scenario 1 – Group 1 and Group 2 are perfect          |                                                       |                |                |              |               |  |  |  |
| 0                                                     | 0                                                     | 0              | 0              | 0            | 0             |  |  |  |
| 1                                                     | 1                                                     | 1              | 1              | 1            | 1             |  |  |  |
| Ś                                                     | Scenario 2 – Group 1 output is 0; Group 2 output is 0 |                |                |              |               |  |  |  |
| 0                                                     | 0                                                     | 1              | 0              | 0            | 0             |  |  |  |
| 0                                                     | 1                                                     | 0              | 0              | 0            | 0             |  |  |  |
| 1                                                     | 0                                                     | 0              | 0              | 0            | 0             |  |  |  |
| Ś                                                     | Scenario 3 – Group 1 output is 0; Group 2 output is 1 |                |                |              |               |  |  |  |
| 0                                                     | 0                                                     | 1              | 1              | 0            | 0             |  |  |  |
| 0                                                     | 1                                                     | 0              | 1              | 0            | 0             |  |  |  |
| 1                                                     | 0                                                     | 0              | 1              | 0            | 0             |  |  |  |
| Ś                                                     | Scenario 4 – Group 1 output is 1; Group 2 output is 1 |                |                |              |               |  |  |  |
| 1                                                     | 1                                                     | 0              | 1              | 1            | 1             |  |  |  |
| 1                                                     | 0                                                     | 1              | 1              | 1            | 1             |  |  |  |
| 0                                                     | 1                                                     | 1              | 1              | 1            | 1             |  |  |  |
| Scenario 5 – Group 1 output is 1; Group 2 output is 0 |                                                       |                |                |              |               |  |  |  |
| 1                                                     | 1                                                     | 0              | 0              | 1            | 1             |  |  |  |
| 1                                                     | 0                                                     | 1              | 0              | 1            | 1             |  |  |  |
| 0                                                     | 1                                                     | 1              | 0              | 1            | 1             |  |  |  |

TABLE 2. Illustrating a 2-of-M SHR system operation.

| 0                                                     | Group 1 Group 2                                       |        |        | I&E Voter |     |         | System  |       |          |           |            |
|-------------------------------------------------------|-------------------------------------------------------|--------|--------|-----------|-----|---------|---------|-------|----------|-----------|------------|
|                                                       |                                                       |        |        |           |     |         |         | C     | )utputs* |           | Output     |
| $B_1$                                                 | $B_2$                                                 | $B_3$  | $B_4$  |           |     | $B_{M}$ | $I_1$   | $I_2$ | MAJ      | MIN       | SSO        |
|                                                       | Scenario 1 – Group 1 and Group 2 are perfect          |        |        |           |     |         |         |       |          |           |            |
| 0                                                     | 0                                                     | 0      | 0      |           |     | 0       | 0       | 0     | 0        | 0         | 0          |
| 1                                                     | 1                                                     | 1      | 1      |           |     | 1       | 1       | 1     | 1        | 1         | 1          |
|                                                       | S                                                     | Scenar | io 2 - | - Gro     | oup | 1 outp  | ut is ( | 0; Gr | oup 2 ou | tput is 0 |            |
| 0                                                     | 0                                                     | 1      | 0      |           |     | 1       | 0       | 1     | 0        | 0         | 0          |
| 0                                                     | 1                                                     | 0      | 0      |           |     | 1       | 0       | 1     | 0        | 0         | 0          |
| 1                                                     | 0                                                     | 0      | 0      |           |     | 1       | 0       | 1     | 0        | 0         | 0          |
|                                                       | Scenario 3 – Group 1 output is 0; Group 2 output is 1 |        |        |           |     |         |         |       |          |           |            |
| 0                                                     | 0                                                     | 1      | 1      |           |     | 1       | 1       | 1     | 0        | 1         | 0          |
| 0                                                     | 1                                                     | 0      | 1      |           |     | 1       | 1       | 1     | 0        | 1         | 0          |
| 1                                                     | 0                                                     | 0      | 1      |           |     | 1       | 1       | 1     | 0        | 1         | 0          |
| Scenario 4 – Group 1 output is 1; Group 2 output is 1 |                                                       |        |        |           |     |         |         |       |          |           |            |
| 1                                                     | 1                                                     | 0      | 1      |           |     | 0       | 1       | 1     | 1        | 1         | 1          |
| 1                                                     | 0                                                     | 1      | 1      |           |     | 0       | 1       | 1     | 1        | 1         | 1          |
| 0                                                     | 1                                                     | 1      | 1      |           |     | 0       | 1       | 1     | 1        | 1         | 1          |
| Scenario 5 – Group 1 output is 1; Group 2 output is 0 |                                                       |        |        |           |     |         |         |       |          |           |            |
| 1                                                     | 1                                                     | 0      | 0      |           | •   | 0       | 0       | 0     | 1        | 0         | 1          |
| 1                                                     | 0                                                     | 1      | 0      |           |     | 0       | 0       | 0     | 1        | 0         | 1          |
| 0                                                     | 1                                                     | 1      | 0      |           |     | 0       | 0       | 0     | 1        | 0         | 1          |
|                                                       |                                                       | *      | 18.0   | Vot       |     | itouto  | Int     | ormo1 | and Ext  | man 1 Vot | or Outputs |

l&E Voter Outputs – Internal and External Voter Outputs

the output of Group 1, which is governed by the Boolean majority, is kept as the reference while considering the output of Group 2 to determine the output of the SHR system (SSO). The SHR architecture, like the DMMR architecture, inherently accords higher priority to the output of Group 1 compared to the output of Group 2. This is because the Boolean majority condition is unambiguous, but the Boolean minority condition may be ambiguous. For example, if at least 2 out of the 3 function blocks in Group 1 of the 2-of-M SHR system (shown in Fig. 2b) would agree to produce the same output, then there would be no ambiguity in the production of the correct majority output (MAJ). On the other hand, if we assume that the output of one of the function blocks in Group 2 (say, B<sub>4</sub>) of the 2-of-M SHR system is 1, and the outputs of the rest of the function blocks are 0, i.e., B<sub>5</sub> up to  $B_M$  are 0, then 0 and 1 can be specified as the outputs of Group 2, according to the Boolean minority condition, since at least  $B_4$  is 1 and at least one of  $B_5$  up to  $B_M$  is 0, which causes an ambiguity. Therefore, the output of a 2-of-M SHR system is primarily governed by the output of Group 1, which is subject to the Boolean majority condition, i.e., in a 2-of-M SHR system, SSO is equal to the output of Group 1 i.e., MAJ, as reflected in (2) and (6). This would also be evident from Tables 1 and 2.

In Table 1, Scenario 1 signifies the perfect states of Group 1 and Group 2, and SSO = MAJ under this scenario, which is correct. Scenarios 2 and 4 signify the faulty but the maskable state of Group 1 and the non-faulty state of Group 2. Under either of these scenarios, at least 2 out of the 3 function blocks in Group 1 produce the same output satisfying the Boolean majority, which tallies with the output of Group 2. Hence,  $MAJ = B_4$  and SSO = MAJ, which is correct. Scenarios 3 and 5 represent the faulty but the maskable state of Group 1 and the (worst-case) complete failure of Group 2. As a result, MAJ  $\neq$  B<sub>4</sub>. However, due to the action of the self-healing circuit, and according to (2), SSO = MAJ, which is correct. Thus, the worst-case scenario of the complete failure of Group 2 would be tolerated by the 2-of-4 SHR system provided its voter and self-healing circuit are perfect. In general, compared to the function blocks, the voter and the self-healing circuit may account for just a small proportion of a SHR circuit or system and so the assumptions of a perfect voter and a self-healing circuit may be reasonable.

In Table 2, the representations used for the outputs of the function blocks corresponding to Group 2 imply the following: i) ' $B_4 cdots B_M$ ' given by '0 cdots 0' implies  $B_4$  up to  $B_M$  are 0, ii) ' $B_4 cdots B_M$ ' given by '0 . . 1' implies  $B_4$  is 0 and  $B_5$  up to  $B_M$  may be 1, iii) ' $B_4 cdots B_M$ ' given by '1 cdots 0' implies  $B_4$  is 1 and  $B_5$  up to  $B_M$  may assume 0, and iv) ' $B_4 cdots B_M$ ' given by '1..1' implies  $B_4$  up to  $B_M$  are 1.

In Table 2, Scenario 1 represents the perfect states of Group 1 and Group 2, and in this scenario  $I_1$  and  $I_2$  are equal, and hence MAJ and MIN are also equal and correct. Hence, SSO is also correct. Scenarios 2 and 4 represent the faulty but the maskable states of Group 1 and Group 2. In Scenario 2, because at least 2 of the 3 function block outputs in Group 1 are 0, MAJ evaluates to 0 as per the Boolean majority. Since  $B_4$  is 0 under this scenario,  $I_1 = 0$ . Since MAJ selects  $I_1$  to forward its value to MIN, therefore MAJ = MIN = SSO = 0, which is correct. With respect to Scenario 4, because at least 2 out of the 3 function block outputs in Group 1 are 1, MAJ equates to 1 as per the Boolean majority. Since  $B_4$  is 1 under this scenario,  $I_1 = 1$ . Because MAJ

![](_page_5_Figure_2.jpeg)

**FIGURE 3.** System reliabilities of simplex, NMR, DMMR, and SHR systems versus the function block reliability.

selects  $I_1$  and forwards its value to MIN, therefore MAJ = MIN = SSO = 1, which is correct.

Scenarios 3 and 5 signify the faulty but the maskable state of Group 1 and the (worst-case) complete failure of Group 2. Given Scenario 3, since at least 2 out of the 3 function block outputs corresponding to Group 1 are 0, therefore MAJ = 0. Due to the assumption of the complete failure of Group 2, therefore  $B_4$  up to  $B_M$  are all 1. Since  $I_1$  and  $I_2$  are both 1 under this scenario, therefore MIN = 1. Although MAJ and MIN are contradictory, nevertheless, since they are supplied to the self-healing circuit, as per (6), SSO correctly evaluates to 0. Now considering Scenario 5, at least 2 out of the 3 function block outputs corresponding to Group 1 are 1, and so MAJ = 1. Due to the assumption of the complete failure of Group 2,  $B_4$  up to  $B_M$  are all 0. Since  $I_1 = I_2 = 0$ , therefore MIN = 0. Again, although MAJ and MIN are contradictory, based on the action of the self-healing circuit and according to (6), SSO correctly evaluates to 1.

#### C. RELIABILITY OF SHR SYSTEMS

Let R represent the reliability i.e., the probability of the correct operation of a function block, and (1-R) specifies the probability of its incorrect operation. It is implied that  $\mathbf{R} = \mathbf{R}(t)$  in the system reliability equations, i.e., the reliability is expressed as a function of time t. Let us assume that the reliabilities of multiple function blocks used in various redundant systems such as NMR, DMMR and SHR systems are equivalent since identical function blocks are used. Further, assuming the perfect behavior of the voters (and the self-healing circuits) comprising various redundant systems, the system reliability equations of the 2-of-4 ( $R_{2-of-4}$ ), 2-of-5 (R<sub>2-of-5</sub>), and 2-of-6 (R<sub>2-of-6</sub>) SHR systems are given by (7) to (9). Equations (7) to (9) have been derived based on the notion that the reliability of the 2-of-4 or a 2-of-M SHR system is dependent upon the correct operation of at least 2 or all the 3 function blocks present in Group 1 which is accompanied by the correct operation of one or more or all the function blocks in Group 2, or the minimum correct operation of at least 2 out of the 3 function blocks present in Group 1 (which represents the worst-case scenario).

$$R_{2\text{-of-4}} = R^4 + 4R^3(1-R) + 3R^2(1-R)^2$$
(7)  

$$R_{2\text{-of-5}} = R^5 + 5R^4(1-R) + 7R^3(1-R)^2$$
(7)

$$R_{2-\text{of-5}} = R^3 + 5R^3(1-R) + 7R^3(1-R)^2 + 3R^2(1-R)^3$$
(8)

$$R_{2-\text{of-}6} = R^{6} + 6R^{5}(1-R) + 12R^{4}(1-R)^{2} + 10R^{3}(1-R)^{3} + 3R^{2}(1-R)^{4}$$
(9)

A plot of the reliabilities of the above mentioned SHR systems versus the function block reliability is shown in Fig. 3 alongside the reliabilities of the corresponding NMR and DMMR systems, and the simplex system. It was noted that (7) to (9) yield the same system reliability values for similar values of the function block reliability. Hence the system reliabilities of the 2-of-4, 2-of-5, and 2-of-6 SHR systems are shown using a single plot viz. the SHR in Fig. 3.

It can be seen in Fig. 3 that the reliability of a SHR system is slightly less than the reliabilities of higher order NMR systems and is equal to or slightly greater than the reliabilities of DMMR systems for R > 0.5. It was found that the reliability of a SHR system is the same as the reliability of a 3-modular redundant (3MR) system. The reliability of a function block employed in a mission- or safety-critical system is generally high [13] i.e., 0.9 < R < 1. Hence, considering a range of function block reliabilities varying from 0.9 to 0.99 in steps of 0.01, we found that, on average, the reliability of a SHR system is marginally less than the reliabilities of the 5MR, 7MR, and 9MR systems by 0.8%, 1%, and 1.1% respectively. This is the minor trade-off involved in the SHR scheme versus the higher order NMR schemes to attain significant reductions in the number of function blocks used to achieve the same degree of fault tolerance. However, over the same range of function block reliabilities considered, i.e., 0.9 to 0.99,

![](_page_6_Figure_2.jpeg)

**FIGURE 4.** Plots of (a) 2-of-4 SHR system and group(s) reliabilities, (b) 2-of-5 SHR system and group(s) reliabilities, and (c) 2-of-6 SHR system and group(s) reliabilities.

it was noted that the mean reliability of a SHR system is marginally greater than the mean reliabilities of the 3-of-5, 3-of-6, and 3-of-7 DMMR systems. Although the 2-of-4 or 2-of-M SHR systems could tolerate the worst-case scenario of the complete failure of Group 2 and still produce the correct output, nevertheless the correct

![](_page_7_Figure_2.jpeg)

**FIGURE 5.** Split-up of power dissipated by the function blocks and voters of NMR and DMMR systems, and the power dissipated by the function blocks, voters, and self-healing circuits of SHR systems.

operation of Group 2 in conjunction with Group 1 significantly contributes to the reliability of SHR systems, which may be evident from Fig. 4. Figs. 4a, 4b and 4c show the plots of the system reliabilities of the 2-of-4, 2-of-5 and 2-of-6 SHR systems based on the corresponding group(s) reliabilities i.e., by assuming either Group 1 is alone operating correctly or both Group 1 and Group 2 are operating correctly. It may be noted that when Group 1 and Group 2 operate correctly the SHR system reliability becomes high compared to the correct operation of just Group 1. Further, the reliability contribution resulting from the correct operation of Group 1 and Group 2 to the SHR system reliability tends to increase with an increase in the level of redundancy, as seen from Figs. 4a, 4b and 4c.

#### **III. RESULTS AND DISCUSSION**

Example 2-of-4, 2-of-5, and 2-of-6 SHR systems and their corresponding NMR (5MR, 7MR, 9MR) and DMMR (3-of-5, 3-of-6, 3-of-7) systems were implemented in semicustom ASIC design style using the gates of a 32/28nm CMOS standard digital cell library [14]. A  $4 \times 4$  array multiplier was used for the function blocks, as in [11]. This is to facilitate a straightforward comparison with the NMR and DMMR systems realized in [11], which also utilized the  $4 \times 4$  array multiplier for the function blocks.

The NMR, DMMR, and SHR systems physically implemented were verified by performing functional simulations. The switching activity captured through the functional simulations were used for the average power estimation. The average power was estimated accurately by performing a time-based power analysis. The simulations were performed by supplying all the distinct input vectors identically to all the function blocks at time intervals of 2.5ns (400MHz), like [11]. This paves the way for a direct comparison of the design parameters of different redundant systems after synthesis. The average power dissipation, critical path delay, and area of the redundant systems, estimated using Synopsys tools, are given in Table 3.

 TABLE 3. Design metrics of corresponding NMR, DMMR and SHR systems, estimated using a 32/28nm bulk CMOS process. The 2-of-4, 2-of-5, and 2-of-6 SHR systems are those proposed.

| Type of Redundancy | Power (µW) | Delay (ns) | Area (µm <sup>2</sup> ) |
|--------------------|------------|------------|-------------------------|
| 5MR                | 120.7      | 0.98       | 529.64                  |
| 3-of-5 DMMR        | 109.3      | 0.90       | 480.84                  |
| 2-of-4 SHR         | 88.7       | 0.93       | 394.43                  |
| 7MR                | 191.2      | 1.12       | 865.11                  |
| 3-of-6 DMMR        | 129.4      | 0.90       | 567.25                  |
| 2-of-5 SHR         | 118.8      | 1.06       | 537.77                  |
| 9MR                | 278.5      | 1.23       | 1269.7                  |
| 3-of-7 DMMR        | 151.2      | 0.91       | 661.79                  |
| 2-of-6 SHR         | 139.4      | 1.06       | 626.21                  |

It is observed from Table 3 that the SHR systems, in general, dissipate less power and occupy less area than the corresponding NMR and DMMR systems. This is because the SHR systems require fewer function blocks than the counterpart NMR and DMMR systems to achieve the same degree of fault tolerance. The 2-of-4, 2-of-5 and 2-of-6 SHR systems report respective reductions in the power dissipation by 26.5%, 37.9% and 50% compared to the 5MR, 7MR and 9MR systems, and by 18.8%, 8.2% and 7.8% compared to the 3-of-5, 3-of-6 and 3-of-7 DMMR systems. In terms of the area, the 2-of-4, 2-of-5 and 2-of-6 SHR systems report respective reductions by 25.5%, 37.8% and 50.7% compared to the 5MR, 7MR and 9MR systems, and by 18%, 5.2%

![](_page_8_Figure_2.jpeg)

FIGURE 6. Area occupancies of voters corresponding to NMR and DMMR systems, and area occupancies of voters and self-healing circuits corresponding to counterpart SHR systems.

and 5.4% compared to the 3-of-5, 3-of-6 and 3-of-7 DMMR systems.

The split-up of the power dissipated by the function blocks and the voters of NMR and DMMR systems, and the power dissipation components of the function blocks, voters, and self-healing circuits of the corresponding SHR systems is shown in Fig. 5. The NMR and DMMR systems contain just the voter besides the function blocks, whereas the SHR systems consist of the voter and the self-healing circuit. The voter of a 2-of-M SHR system is more complex than the voter of a counterpart 3-of-M DMMR system. This would be evident upon comparing Fig. 2b with Fig. 1b. Fig. 6 shows the area occupancies of the voters corresponding to NMR and DMMR systems, and the areas of the voters and self-healing circuits of the counterpart SHR systems.

The majority voter of a NMR system substantially increases in size for an increase in the redundancy [11]. Hence, the power dissipation components of majority voters corresponding to the NMR systems increases considerably for an increase in the redundancy, which can be noticed from Fig. 5. The majority voters of the 5MR, 7MR, and 9MR systems dissipate  $15.3\mu$ W,  $38.4\mu$ W, and  $75\mu$ W respectively. In comparison to the NMR majority voters, the voter of a DMMR or a SHR system gradually increases in size for an increase in the redundancy. The average power dissipation components of the voters corresponding to the 3-of-5, 3-of-6, and 3-of-7 DMMR systems are  $8\mu$  W,  $8.1\mu$ W, and  $10\mu$ W respectively. On the other hand, the power dissipation components of the voters corresponding to the 2-of-4, 2-of-5, and 2-of-6 SHR systems are  $3.5\mu$ W,  $13.1\mu$  W, and  $13.5\mu$ W respectively; the respective power dissipation components of their self-healing circuits are  $3.8\mu$ W,  $3.7\mu$ W, and  $3.7\mu$ W. Hence, the power dissipation components of the voters belonging to the DMMR systems, and the power dissipation components of the voters and self-healing circuits of the SHR systems are found to increase only nominally with an increase in the redundancy.

The critical path of a NMR system comprises the function block and the majority voter. The voters of NMR systems would incorporate more logic for increases in the redundancy, which would be accompanied by increases in the logic depth [11]. As a result, the critical path delays of the NMR systems are greater than the critical path delays of the DMMR and SHR systems. The critical path of a DMMR system entails a function block and the DMMR voter (i.e., the AO222 gate and a 2-input AND gate), as shown in Fig. 1b. On the other hand, the critical path of a SHR system would traverse a function block, the voter (i.e., the AO222 gate in a 2-of-4 SHR system, and the AO222 gate and the 2:1 MUX in a 2-of-M SHR system), and the self-healing circuit (i.e., the inverter and an AO22 gate). Thus, the critical path delay of a SHR system would be slightly greater than the critical path delay of a DMMR system due to an increase in the logic depth, which is evident from Table 3.

In terms of the critical path delay, the SHR systems are generally better than the NMR systems but not than the DMMR systems. The 2-of-4, 2-of-5 and 2-of-6 SHR systems achieve 5.1%, 5.4% and 13.8% reductions in the critical path delay compared to the 5MR, 7MR and 9MR systems. This is because the voters of 5MR, 7MR and 9MR systems incorporate more logic as seen in Fig. 6, and feature increases in the logic depth compared to the voters of corresponding SHR systems. On the other hand, the 3-of-5, 3-of-6 and 3-of-7 DMMR systems report 3.2%, 15.1% and 14.2% reductions in critical path delay than the 2-of-4, 2-of-5 and 2-of-6 SHR systems. However, if sophisticated function blocks are considered for deployment, the differences between the critical path delays of DMMR and the corresponding SHR systems may not be

significant since the propagation delay of the function block may dominate the delay of the voter or the combined delays of the voter and the self-healing circuit of the SHR systems. Also, when bigger function blocks are used, the proposed SHR scheme will stand to gain more over the NMR and DMMR schemes since it was shown that the SHR scheme requires fewer identical function blocks compared to the NMR and DMMR schemes.

#### **IV. CONCLUSIONS**

This paper presented a novel SHR scheme for the design of circuits, sub-systems, and systems meant for use in missionand safety-critical applications. The proposed SHR scheme requires fewer function blocks than the NMR and DMMR schemes to achieve the same degree of fault tolerance. As a result, the SHR scheme leads to enhanced optimizations in the design metrics. The system reliability based on the SHR scheme is equal to the system reliability of the 3MR scheme; is slightly greater than the system reliability of the DMMR scheme; and is slightly less than the system reliability of the higher order NMR scheme. Unlike the NMR and DMMR schemes, the SHR scheme is self-healing. In the NMR scheme, the Boolean majority condition should be satisfied by at least (N + 1)/2 out of the N identical function blocks. However, in the SHR scheme, the Boolean majority condition is imposed on only 3 function blocks comprising Group 1, like the DMMR scheme. From the perspectives of fault tolerance, self-healing capability, and reductions in the design metrics, we infer that the SHR scheme is preferable to the NMR and DMMR schemes.

#### REFERENCES

- R. C. Baumann, "Radiation-induced soft errors in advanced semiconductor technologies," *IEEE Trans. Device Mater. Rel.*, vol. 5, no. 3, pp. 305–316, Sep. 2005.
- [2] H. Quinn, P. Graham, J. Krone, M. Caffrey, and S. Rezgui, "Radiationinduced multi-bit upsets in SRAM-based FPGAs," *IEEE Trans. Nucl. Sci.*, vol. 52, no. 6, pp. 2455–2461, Dec. 2005.
- [3] N. Seifert *et al.*, "Radiation-induced soft error rates of advanced CMOS bulk devices," in *Proc. IEEE Int. Rel. Phys. Symp.*, Mar. 2006, pp. 217–225.
- [4] N. N. Mahatme *et al.*, "Terrestrial SER characterization for nanoscale technologies: A comparative study," in *Proc. IEEE Int. Rel. Phys. Symp.*, Apr. 2015, pp. 4B.4.1–4B.4.7.
- [5] N. Miskov-Zivanov and D. Marculescu, "Multiple transient faults in combinational and sequential circuits: A systematic approach," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 29, no. 10, pp. 1614–1627, Oct. 2010.
- [6] D. Rossi, M. Omaña, C. Metra, and A. Paccagnella, "Impact of aging phenomena on soft error susceptibility," in *Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst.*, Oct. 2011, pp. 18–24.

- [7] M. Omaña, D. Rossi, T. Edara, and C. Metra, "Impact of aging phenomena on latches' robustness," *IEEE Trans. Nanotechnol.*, vol. 15, no. 2, pp. 129–136, Mar. 2016.
- [8] T. Ban and L. Naviner, "Progressive module redundancy for fault-tolerant designs in nanoelectronics," *Microelectron. Rel.*, vol. 51, nos. 9–11, pp. 1489–1492, Sep./Nov. 2011.
- [9] B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems. Reading, MA, USA: Addison-Wesley, 1989.
- [10] E. Dubrova, Fault-Tolerant Design. New York, NY, USA: , Springer, 2013.
- [11] P. Balasubramanian and D. L. Maskell, "A distributed minority and majority voting based redundancy scheme," *Microelectron. Rel.*, vol. 55, nos. 9–10, pp. 1373–1378, Aug./Sep. 2015.
- [12] P. Balasubramanian and R. T. Naayagi, "Redundant logic insertion and fault tolerance improvement in combinational circuits," in *Proc. Int. Conf. Circuits, Syst. Simulation*, Jul. 2017, pp. 6–13.
- [13] I. Koren and C. M. Krishna, Fault-Tolerant Systems. San Mateo, CA, USA: Morgan Kaufmann, 2007.
- [14] SAED\_EDK32/28\_CORE Databook, Synopsys, Mountain View, CA, USA, Jan. 2012.

![](_page_9_Picture_20.jpeg)

**P. BALASUBRAMANIAN** received the B.E. degree in electronics and communication engineering from the University of Madras, India, in 1998, the M.Tech. degree in VLSI system from the National Institute of Technology, Tiruchirappalli, India, in 2005, and the Ph.D. degree in computer science from the University of Manchester, U.K., in 2010. He is currently a Research Fellow with the School of Computer Science and Engineering, Nanyang Technological University,

Singapore. He has published about 90 research papers in international journals and conferences. His research interests include approximate computing, reliability and fault tolerance, asynchronous circuits, logic synthesis, and computer arithmetic. He served/serves as a reviewer for several international journals and conferences, and also served/serves on the technical program and steering committees of many international conferences. He was recognized as an outstanding reviewer by *Microelectronics Reliability* (Elsevier) in 2016, *Computers and Electrical Engineering* (Elsevier) in 2017, and *Integration, the VLSI Journal* (Elsevier) in 2018.

![](_page_9_Picture_23.jpeg)

**DOUGLAS L. MASKELL** received the B.E. (Hons.), M.Eng.Sc., and Ph.D. degrees in electronic and computer engineering from James Cook University, Douglas, QLD, Australia, in 1980, 1984, and 1996, respectively. He is currently an Associate Professor with the School of Computer Science and Engineering, Nanyang Technological University, Singapore. He has authored or co-authored over 160 research papers in international journals and conferences. His research inter-

ests include the areas of embedded systems, reconfigurable computing, intelligent systems, and algorithm acceleration for hybrid high-performance computing systems.

• • •