

Received January 25, 2021, accepted February 10, 2021, date of publication February 12, 2021, date of current version February 24, 2021. *Digital Object Identifier* 10.1109/ACCESS.2021.3059171

# A Low-Power and Area-Efficient Design of a Weighted Pseudorandom Test-Pattern Generator for a Test-Per-Scan Built-in Self-Test Architecture

## VISHNUPRIYA SHIVAKUMAR<sup>D</sup>, CHINNAIYAN SENTHILPARI, (Senior Member, IEEE), AND ZUBAIDA YUSOFF Faculty of Engineering, Multimedia University, Cyberjaya 63100, Malaysia

Corresponding author: Vishnupriya Shivakumar (vishnu.priya120@gmail.com)

This work was supported in part by the Multimedia University under Grant MMUI/190002.02.

**ABSTRACT** A test pattern generator generates a pseudorandom test pattern that can be weighted to reduce the fault coverage in a built-in self-test. The objective of this paper is to propose a new weighted TPG for a scan-based BIST architecture. The motivation of this work is to generate efficient weighted patterns for enabling scan chains with reduced power consumption and area. Additionally, the pseudo-primary seed of TPG is maximized to obtain a considerable length in the weighted pseudorandom patterns. The maximum-length weighted patterns are executed by assigning separate weights to the specific scan chains using a weight-enabled clock. This approach reduces the hardware overhead and achieves a low power consumption of 26.7 nW. Moreover, the proposed weighted TPG is applied in two different test-per-scan BIST architectures and achieves accurate results. The weighted patterns are also generated with fewer switching transitions and higher fault coverages of 98.81% and 97.35% in two different BIST architectures. This process is observed with six other circuits under test as their scan chains. The simulation results are tested with a SilTerra 0.13  $\mu$ m process on the Mentor Graphics IC design platform. Furthermore, the proposed weighted TPG is enlarged to a higher bit TPG, which is compared to accomplish the performance strategies. The experimental results of the proposed TPG design are compared and tabulated with existing potential TPG designs.

**INDEX TERMS** Built-in self-test (BIST), circuit under test (CUT), test-pattern generator (TPG).

### I. INTRODUCTION

Modern technology has focused on developing low-power systems for very-large-scale integration (VLSI) high-speed designs. As a result, several design strategies have been implemented to mitigate trade-offs between performance, power, and area. Instead, several approaches have concentrated on low-power dissipation during BIST normal-mode operations rather than test-mode operations [1]. During the BIST test mode operation, the switching activity in the scan chains and test data compression using the appropriate TPG are crucial. Moreover, this testing should be achieved with high reliability and sensitivity in semiconductor designs [2]. Figure 1 illustrates an example of a conventional



**FIGURE 1.** An example of a conventional pseudorandom TPG; Jinyi *et al.* [5].

pseudorandom TPG. The TPG consists of the sequence of length *n* shift registers and input seed bits of  $a_0, a_1, a_2, \ldots, a_n$ . Based on the *n*-th bit of the shift register, the (i + 1)-th clock cycle is updated continuously by the (n - 1)-th bit of the shift register and the *i*-th clock cycle. The TPGs use a high

The associate editor coordinating the review of this manuscript and approving it for publication was Wu-Shiung Feng.



FIGURE 2. An example of an existing 3-bit weighted pseudorandom TPG: (a) redundant TPG; Barry et al. [17] and (b) TPG; Xiang et al. [18].

degree of parallelism to achieve high-yield specifications in many practical applications. The TPG linear function is accomplished according to the output feedback signal and the input seed bits [3]. Its linear functionalities are used in many applications such as aircraft systems, cockpit systems, medical systems, audio and video systems, and power generation and distribution systems [4].

A TPG consists of deterministic, exhaustive, pseudorandom, pseudorandom-weighted, and mixed-mode outputs [6]. The pseudorandom-weighted output is used to achieve higher fault coverage in many BIST structures [7]. The weighted pseudorandom TPG exhibits true randomness and repeatable patterns in all clock cycles. Typically, it requires one seed bit to produce one test pattern for *n* cycles of the scanning phase in the test-per-scan BIST, where n is the scan chain length [8]. The latest study [9] decreased the switching activity during scan shift cycles. Additionally, the TPG allows the automatic selection of weighted parameters to achieve its low power. The weighted pseudorandom TPG methods [10], [11] and their implementation in [12], [13] can effectively reduce the switching transitions. However, the methods [10], [11], included additional XOR transitions between the shift registers, it consumed more power and area. The concerned drawbacks are eliminated in the proposed design effectively. The BIST requirements should mainly focus on the higher fault coverage and the lesser weighted switching activity with lower power and reduced area overhead [11]. To achieve these requirements, two approaches can be utilized. One is to alter the circuit design of the weighted TPG. The other is to include additional hardware in the weighted TPG [14], [15]. Hence, in this paper, a new pseudorandom-weighted TPG is constructed using additional hardware. Additionally, higher fault coverage is achieved in terms of eliminating transition delay faults using test-point insertion. The test-points are inserted for every two NAND gate structures of the overall design area. The proposed technique involves swapping weighted test patterns to the scan chains using a phase shifter. The swapping of the weighted patterns considered for selecting the prior scan chains with lesser area is compared with that of the other scan chains. The weighted patterns are hence used with all the scan chains of BIST architecture. This eliminates the faults at a specified output and improves the fault coverages. The TPG also improves its rapid switching activity due to its selected weighted patterns and reduces its average scanning and capturing power consumption during BIST test-per-scan.

The proposed TPG is designed using logic gate techniques and applied in two different test-per-scan BIST architectures. It is implemented and evaluated on a Mentor Graphics IC station using a SilTerra 0.13  $\mu$ m submicron process. The rest of this paper is organized as follows: The existing weighted pseudorandom TPG is outlined in Section II. In Section III, a new weighted pseudorandom TPG is proposed. Furthermore, the mathematical analysis of the TPG depicts the achievement of maximum-length weighted patterns for the subset of pseudo-primary seeds concurrently. Section IV presents the application of the proposed TPG in two different BIST architectures. Experimental results and a discussion of the proposed TPG along with existing TPGs are presented in Section V.

### **II. EXISTING WEIGHTED PSEUDORANDOM TPGs**

The primitive polynomial chosen for the TPG is determined by the even or odd tap bits from the register. Generally, primitive polynomials are used for generating pseudorandom patterns. If the tap bit sequences of an *n*-bit TPG are  $n,m,k,l, \ldots, 0$ , then the coprimes of the tap numbers, such as  $n - n, n - m, n - k, n - l, \ldots, n - 0$ , will also generate the pseudorandom TPG output [16]. Using this concept, the TPG can generate a considerable length of pseudo-primary seeds. In Figures 2 and 3, the black line denotes that the source TPG yields pseudorandom patterns, and the blue line components show the additional hardware used for generating the weighted pseudorandom patterns.



FIGURE 3. Examples of existing weighted 3-bit pseudorandom TPGs: (a) Prasad et al. [16] and (b) Hwasoo Shin et al. [19].

The existing weighted redundant TPG method in Figure 2 (a) uses either hardware or duplication factors. The hardware redundancy duplicates its function into double modular redundancy, triple modular redundancy, and so on. This can be achieved by duplicating the additional hardware for the source TPG design, thus eliminating the random pattern-resistant faults. However, the hardware redundancy TPG achieves good performance factors; it should be reducible in the hardware overhead. The time redundancy accomplished using the multiple time criteria instead of the hardware using the asynchronous clock values ranges from "0" to "1". Nonetheless, the same operation is executed using multiple time factors for the weighted patterns. This approach detects numerous resistant faults during different clock cycles.

The Xiang *et al.* [18] TPG in Figure 2 (b) is mainly used for generating the weighted patterns using the control bits for the multiplexer (Mux). The additional hardware is also used to generate the reseeding bits required for the test patterns to identify their faults. The Mux in the additional hardware identifies the input seeds and the control bits when the constant weight input  $W_{in}$  passes the weighted patterns to an output. However, this includes a large area overhead in the scan-forest design of the BIST architecture. This can also be used for the lesser critical path delay, the scanning power, and capture power testing. Additionally, this approach uses a complicated design for the weighted reseeding technique.

To overcome the limitations of existing TPGs [17], [18], more advanced TPG techniques are introduced later, as shown in Figure 3. These techniques target the weight of the test patterns using techniques such as information redundancy and parallel TPGs. Figure 3 (a) represents the Prasad *et al.* [16] TPG. This TPG method uses a Mux between the D flipflops, wherein the control signal is configured to the weighted pattern's initial valid state. Until the valid state of the weighted patterns is generated simultaneously, the additional gates introduce a delay in the circuit. Additionally, the gates adversely impacts the area and speed of the TPG. Figure 3 (b) shows the Hwasoo *et al.* TPG [19]. Unlike the existing TPG methods, it uses a small number of hardware logic components rather than a replication. Additionally, the weighted patterns are generated, preferably using only one register function in the blue line. Although the weighted patterns predicted in this existing work have less area overhead, they are insufficient for a large number of pseudo-primary seed bits 'm'. A summary of the existing and proposed weighted TPGs is tabulated in Table 1. The requirements of the existing works are thus improved in the proposed TPG. It included a valid state of generating the weighted patterns for a larger seed bit with less power and area overhead. Significantly, this approach should guarantee low-power operation in all the test-per-scan phases of the BIST architecture.

introduce high switching transitions in the design. The delay

## III. PROPOSED LOW-POWER WEIGHTED PSEUDORANDOM TPG USING THE GALOIS OPERATION WITH A PHASE SHIFTER

Compared with the existing methods, the proposed weighted TPG is designed with some advantages, including fewer switching transitions achieved using the specific weighted patterns and reduced power attained using fewer hardware components in the design. This reduces the hardware overhead and improves the fault coverages in the BIST. The TPG method shown in Figure 4 is the proposed TPG, which includes the Galois operation and additional hardware for weighted pattern generation. The Galois operation in the proposed TPG is shown by the black dashed line and assumes constant pseudo-primary seeds (A, X) for simplification. However, the constant seed bits can be enlarged using the same subset of initial primary seeds. The seed subsets are used to achieve the maximum length in weighted patterns with less switching activity. The additional hardware indicated by the blue line uses a smaller number of components for generating the weighted pseudorandom TPG output.

| TABLE 1. Sumn | nary of | the relate | ed works. |
|---------------|---------|------------|-----------|
|---------------|---------|------------|-----------|

| Related               | Circuit techniques                                                                                    |                                                                                                      | Objectives for scan                                                                              | Problem                                                                                                                                                                  | Power         | Area-<br>efficie |
|-----------------------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|------------------|
| work                  | Pseudorandom TPG                                                                                      | Weighted Pseudorandom<br>TPG                                                                         | chains test                                                                                      |                                                                                                                                                                          | reducti<br>on | nt               |
| Jinyi et al.<br>[5]   | Galois adder and Galois<br>multiplier for each tapping<br>bit used                                    | Not applicable for weighted patterns                                                                 | No scan-in test done                                                                             | Failed to produce weighted patterns                                                                                                                                      | No            | No               |
| Barry et<br>al. [17]  | Galois adder and Galois<br>multiplier for each tapping<br>bit used                                    | Hardware components of<br>pseudorandom TPG are<br>replicated                                         | Alternative clock<br>values of logic '0' and<br>'1' used                                         | Inefficient low-power<br>operations for all scan<br>chains                                                                                                               | No            | No               |
| Xiang et<br>al. [18]  | Galois adder for each tapping bit used                                                                | Multiplexer for each tapping bit used                                                                | The test enabled<br>clock signal passed<br>the weight W <sub>in</sub>                            | More propagation delay<br>and complicated design                                                                                                                         | Yes           | No               |
| Prasad et<br>al. [16] | Multiplexer for each<br>tapping bit and one XOR<br>for convolution are used                           | Additional hardware components used                                                                  | Network of logic gates used                                                                      | Higher switching transitions                                                                                                                                             | Yes           | Yes              |
| Hwasoo<br>et al. [19] | Galois adder for each bit<br>used                                                                     | Additional hardware<br>components and XOR gates<br>for successive Galois adder<br>are used           | Chains of XOR gates<br>are used with a D flip-<br>flop estimator                                 | Insufficient weighted<br>patterns for larger pseudo-<br>primary input                                                                                                    | No            | Yes              |
| Proposed              | One Galois adder, one<br>Galois multiplier used for<br>all tapping bits with an<br>asynchronous clock | Additional hardware<br>components including weight<br>generator and weighted<br>multiplexer are used | Weight-enabled clock<br>used to enable the<br>weights $W_E$ , $W_A$<br>using weight<br>generator | Eliminated the problem of<br>previous authors' work by<br>using different valued<br>weights. However, little<br>propagation delay exists in<br>the weight-enabled clock. | Yes           | Yes              |



FIGURE 4. Proposed 3-bit weighted pseudorandom TPG.

In addition, the additional hardware design uses a weightenabled clock, which enables specific weights through successive clock cycles. The particular weights are given to the respective scan chains through the weighted Mux. The weight generator clock selection effectively reduces the fault coverage in terms of the random-pattern resistant fault and the redundant faults in the BIST architecture.

A 3-bit pseudorandom TPG is proposed according to the Galois scheme over a field of  $GF(2^m)$ . The test patterns are generated concurrently using the shift registers and Galois operation. The synchronous clock for the TPG leads the bit sequences to be lost while it is incorporated for *m*-bits.

Hence, the *m*-bit TPG is designed using asynchronous clocks in shift registers. The input vector bit (X) is multiplied continuously by the pseudo-primary seed bit (A) and added to the test vectors (Z). In addition, the state of the registers accommodates the multilevel parallelism in the TPGs. Consequently, the next (i+1)-th state after the *i*-th state is described in terms of the feedback loop structure.

Furthermore, the constant pseudo-primary seeds are enlarged in the Galois operation of the proposed TPG design. This can be extended using the following Galois field *Lemma 1* [20] used to identify the subsets of initial pseudoprimary seeds.

*Lemma 1*: Let *A* and *X* be the two input elements in  $GF(2^m)$  and *Z* be their multiplication using the Galois multiplier. If the field is assumed to be without modular reduction, then their weighted patterns are defined as  $W_Z = W_A * W_X$ .

*Proof*: Let  $A = (a_0, a_1, ..., a_m)$  and  $X = (x_0, x_1, ..., x_m)$  be the two types of elements in  $GF(2^m)$ , and Z be their output. Then, the output Z can be determined as in [20]

$$Z = [(a_0^* x_0) + (a_1^* x_1)^2 + \dots + (a_{m-1}^* x_{m-1})^{2^{m-1}}]$$

From the concept of [6], the weighted function satisfies the linear property. It consists of the property of additivity,  $W(A_0 + A_1) = W(A_0) + W(A_1)$ , and homogeneity  $W(c^*A) = c^*W(A)$ , where 'c' is a constant.

Hence, the weighted patterns of *Z* can be achieved by

 $W(\mathbf{7})$ 

$$= W[(a_0^* x_0) + (a_1^* x_1)^2 + \dots + (a_{m-1}^* x_{m-1})^{2^{m-1}}]$$
  
= W(a\_0^\* x\_0) + W(a\_1^\* x\_1) + \dots + W(a\_{m-1}^\* x\_{m-1})

Thus,

V

$$W(Z) = \sum_{i=0}^{m-1} W(a_i^* x_i) = \sum_{i=0}^{m-1} W(a_i)^* W(x_i) \quad (1)$$

The weighted function is adopted to be between W(Z) = 0for the even weights and W(Z) = 1 for the odd weights. This allows for a large number of seed bits A and X as  $A = 2^m - 1$  and  $X = 2^m - 1$ , where m is the number of message bits in the field of  $GF(2^m)$ . Hence, the weighted patterns can also be generated with the maximum length, as shown in equation (1). the conventional TPGs [5] update the next (i+1)-th state as

$$Y_n[i+1] = Y_{n-1}[i] + x_n^* Y[i], \quad for \ 0 \le n \le m-1 \quad (2)$$

Hence, the additional hardware is designed for seed bit polynomial Z[i] instead of the primitive polynomial Y[i], as in equation (2). The function Z[i] is assumed to be  $Z[a_i] = Z[a_0, a_1, \ldots, a_{m-1}]$  because the maximum length TPGs are generated to the phase shifter. By applying the weighted function as the polynomial Z[i] in equation (2) on the right side,

$$W_{A}[i] = W\left[\sum_{i=0}^{m-1} (Z_{n-1}[i] + x_{n}^{*} Z[i])\right]$$
(3)

As per the linear property, such as homogeneity and the additive property of [6], equation (3) can be simplified as

$$W_{A}[i] = W\left[\sum_{i=0}^{m-1} Z_{n-1}[i]\right] + Z[i]W\left[\sum_{i=0}^{m-1} x_{n}\right]$$
(4)

However, for a large number of seed bits with subjective vector bits  $g(X) = \sum_{i=0}^{m-1} x_n$ , equation (4) is applied as a weighted generator. The conventional TPGs [5] require  $2^m-1$  binary additions due to the primitive polynomial Y[i]. Concerning the proposed 3-bit TPG, the weight generation is defined as the convolution of the m-1 binary additions multiplexed with pseudo primary seeds. Here, the proposed TPG requires only m' binary additions and one selection, as shown in equation (4).  $W_E$  is the estimated weight to be obtained at the (i+1)-th clock cycle, and k denotes the total number of clocks in equation (5).  $W_E$  assign random weights in the range of '0' to 'k' using the clock delay of the D flip-flop. To obtain the estimated weight in equation (2), the left side can be applied with the weighted function as

$$W_E[i+j] = W[Z_n[i+j]]$$
 where  $j = \{1, 2, \dots k\}$  (5)

The additional hardware introduced in the proposed design is indicated by the blue line components. This approach uses an XOR gate, weight generator, and phase shift selection using the weighted Mux.  $W_A$  is denoted as the tapped convolution value actual weight by the XOR gate, and  $W_E$  are indicated as an estimated weight by the weight generator. The weightenabled clock activates one of the weighted patterns to the weight generator. The weighted patterns are mathematically calculated using the probability distribution.

A flowchart summary of the proposed weighted TPG operation is shown in Figure 5. The weighted Mux acts as a phase shifter to shift the actual and estimated weighted patterns



FIGURE 5. Flowchart of the proposed weighted pseudorandom TPG operation.

to the scan chains. The weighted Mux also selects the convolution bits  $W_E$  and  $W_A$  with self-control, as the weighted patterns  $W_E$  and  $W_A$  can be shifted to the output  $Y_w$  according to the pseudorandom test patterns (Y). The scan chains are identified by the weighted patterns  $W_E$  and  $W_A$  using the following essential properties.

- 1) First, the weighted patterns  $W_A$  generated with the probabilities of having '0' or '1' assigned to the certain scan chains occupy a smaller area. The output of the weighted Mux depends on the important feature of the pseudorandomness of the seed inputs. Consider a case in which  $Y_0$  will be swapped with  $Y_1$ ,  $Y_2$  until  $Y_n$ , according to the value of the Galois operation (*Z*) in the proposed 3-bit TPG. Here,  $Y_2$  is *Y*, which is connected to the selection input of the Mux. This determines the weighted pattern. Hence, overall switching transitions in the scan chain primary inputs can be reduced by 25%.
- 2) Second, the weighted patterns  $W_E$  are selected by the weighted Mux if the pseudorandom output (*Y*) generates an equal number of '0' or '1' values after swapping through the shift registers. If the two scan cell inputs are adjacent to each other at the *i*-th clock, the probability value is assumed to be 0.55. The two interconnected scan chain cells are assigned a probability value of 0.75.

| Iterati<br>on | Mult<br>ed In | iplex<br>puts | Sł | nift reg. | o/p | TPG<br>o/p | Actual<br>Weight   |            | Estimated weight |      |      | Scan chains<br>selection  |
|---------------|---------------|---------------|----|-----------|-----|------------|--------------------|------------|------------------|------|------|---------------------------|
|               | $a_0$         | $x_0$         |    |           |     |            |                    | $W_E(i+j)$ |                  |      |      |                           |
|               |               |               | D1 | D2        | D3  | Y          | W <sub>A</sub> (i) | i+1        | <i>i</i> +2      | i+3  | i+4  | $Y_w$                     |
| 0             | 0             | 0             | 0  | 0         | 0   | 0          | 0                  | 0.55       | 0.65             | 0.75 | 0.85 | $S_0, S_1, S_2, S_3, S_4$ |
| 1             | 0             | 1             | 0  | 0         | 1   | 1          | 0                  | 0.75       | 0.85             | 0.65 | 0.55 | $S_0, S_3, S_4, S_2, S_1$ |
| 2             | 1             | 0             | 1  | 1         | 0   | 0          | 0                  | 0.65       | 0.55             | 0.75 | 0.85 | $S_0, S_2, S_1, S_3, S_4$ |
| 3             | 1             | 1             | 0  | 1         | 1   | 0          | 1                  | 0.55       | 0.75             | 0.85 | 0.65 | $S_5, S_1, S_3, S_4, S_2$ |
| 4             | 0             | 0             | 1  | 0         | 1   | 0          | 1                  | 0.75       | 0.55             | 0.65 | 0.85 | $S_5, S_3, S_1, S_2, S_4$ |
| 5             | 0             | 1             | 0  | 1         | 0   | 1          | 1                  | 0.85       | 0.65             | 0.55 | 0.75 | $S_5, S_4, S_2, S_1, S_3$ |
| 6             | 1             | 0             | 1  | 1         | 1   | 1          | 0                  | 0.65       | 0.75             | 0.85 | 0.55 | $S_0, S_2, S_3, S_4, S_1$ |
| 7             | 1             | 1             | 1  | 0         | 0   | 1          | 1                  | 0.55       | 0.85             | 0.75 | 0.65 | $S_5, S_1, S_4, S_3, S_2$ |

TABLE 2. Operation of the proposed weighted TPG for the scan chains selection.

The random probability value of 0.65 is allocated to the scan chains with similar input vectors whose transition probability has the same value. A probability value of 0.85 is applied to the remaining scan chain cells.

The weights generated by the TPG in different clock cycles are assumed to be  $w_0, w_1, w_2, \dots, w_n \in \{0.55, 0.65, 0.75, 0.85\},\$ which are the probability distributions. The respective weights are assigned to the scan chains as  $S_0, S_1, S_2, \ldots, S_n$ . Here, *n* denotes the total number of scan chains to be tested. The full-length test patterns are weighted based on the probability distribution analysis. For scan chain selection, the weighted patterns are calculated in the estimated times of i+1, i+2, i+3, and i+4. The probability of the weights is considered to insert more scan-shift cycles rather than capture cycles. Hence, the proposed TPG compares the estimated and actual weights at successive clock cycles to detect their faults. The conventional TPG updates the next stages linearly without a weighted function, which is not consistent. The comparison of weights for selecting the scan chains allows specifying the output from the shift register at each clock cycle.

The operation of the proposed 3-bit TPG with the weighted functions is shown in Table 2. The seed bit polynomial Y[i]is defined as  $1 + x^3$  for even weights and  $1 + x + x^3$  for odd weights. The successive weights are generated as an even parity of '0' and an odd parity of '1' concurrently by the Galois operation using equation (1). Additionally, the last term  $W\left[\sum_{i=0}^{m-1} x_n\right]$ , indicated in equation (4), is assumed to be  $W[x_0] = W_A$  in the TPG. Initially, the weight  $W_A$ accumulates in the weighted Mux, and later, at the (i + 1)-th iteration, the weight  $W_E$  is achieved. Although  $W_A$  generates both even and odd parities, the additional  $W_E$  bits are continuously defined for accurate weights in the TPG output. This is represented as  $W_E$  [i + 1] in equation (5) and can be accomplished in the weight generator using the weight enable signal and the tapped convolution values of the cascaded register function. According to the weights  $W_A$  and  $W_E$  using additional hardware, eight iterations can occur during the duration of the (i + j)-th clock cycle. The (i + j)-th clock cycle weighted patterns  $W_E$  are listed as *i*, *i*+1, *i*+2, *i*+3 and i+4. The overall repetitive weighted clock cycle selects the scan chains using weights from '0' to '1' as 0, 0.55, 0.65, 0.75, 0.85, and 1. The weighted clock is operated as an asynchronous clock signal. The different operating frequencies of the weighted clock are determined using the decimal values of the asynchronous D flip-flops as a weight generator. However, the scan chains are partitioned into six numbers based on the weighted pattern selection; this process selects more than one number of scan chains at any clock cycle due to its designated weight function.

## IV. APPLICATION OF PROPOSED WEIGHTED TPG IN TEST-PER-SCAN BIST ARCHITECTURE

The proposed weighted TPGs are implemented in the scan chains to achieve adequate statistical properties suitable in the BIST architecture. BIST architecture is tested using two methods: test-per-clock and test-per-scan. Test per clock is the testing method used to test CUTs individually using the test-point insertion. Test-per-scan is the method used to test the number of scan chains of the BIST in parallel. In general, fault coverage in the test-per-scan BIST can be accurately achieved by using test-point insertion between scan chains. The test-per-scan BIST architecture consists of a TPG, response analyzer, and signature register. The architecture includes the multiple-input signature register (MISR) as a response analyzer used to analyze whether the CUT is fault-free or fault-free [7].

The pseudorandom testing phase is tested with six scan chains, and the results are shown in Table 3. The scan chains are chosen as  $S_0$ ,  $S_1$ ,  $S_2$ ,  $S_3$ ,  $S_4$ , and  $S_5$  for the respective CUTs, such as the Baugh-Wooley multiplier, CSA multiplier restoring square rooter, non-restoring square rooter, restoring array divider, and 6T SRAM memory cell. The scan chain test is carried out with a supply voltage of 2 V due to its better performance analysis. The CUTs, acting as scan chains, are evaluated with respect to their WSA, fault coverage, test power consumption, and area overhead for a more explicit comparison. The best- and worst-case gate counts for the respective scan chains are listed. These are used to compute the requirements for the test pattern length. The length of the test sequences can be increased to achieve better fault

#### TABLE 3. Statistical analysis of the scan chains.

| Scan<br>Chains | CUTs                        | Power<br>Consum | # Gat        | e counts      | Weighted<br>Switching | Fault<br>coverage | Area<br>Overhead |
|----------------|-----------------------------|-----------------|--------------|---------------|-----------------------|-------------------|------------------|
|                |                             | ption<br>(mW)   | Best<br>case | Worst<br>case | Activity<br>(WSA)     | (%)               |                  |
| $S_{\theta}$   | Baugh-Wooley Multiplier     | 2.5             | 223          | 404           | 407                   | 99.4              | 4571.2           |
| $S_1$          | CSA Multiplier              | 1.65            | 346          | 462           | 357                   | 99                | 2708.5           |
| $S_2$          | Restoring Square Rooter     | 0.007           | 452          | 761           | 885                   | 98.7              | 763.24           |
| $S_3$          | Non-restoring Square Rooter | 0.011           | 455          | 785           | 754                   | 99.81             | 561.3            |
| $S_4$          | Restoring Array Divider     | 1.59            | 623          | 851           | 987                   | 97                | 2963.6           |
| <b>S</b> 5     | 6T SRAM Memory cell         | 0.002           | 29           | 63            | 83                    | 100               | 136.5            |

| TABLE 4. | Various p | performance com | parisons of the | proposed TPG | and the existing | 7 TPGs [16]- | [19] in the BIST | architecture [ | 211, [7 | 221. |
|----------|-----------|-----------------|-----------------|--------------|------------------|--------------|------------------|----------------|---------|------|
|          |           |                 |                 | P            |                  |              | []               |                |         |      |

| Architecture     |                       | Appro<br>x. Gate | Weighted<br>Switchin | %<br>WSA    | Fault<br>Coverage | Power co<br>(m   | nsumption<br>IW)  | Through<br>put (G | Frequ<br>ency | Area<br>Overhe |
|------------------|-----------------------|------------------|----------------------|-------------|-------------------|------------------|-------------------|-------------------|---------------|----------------|
|                  |                       | counts           | g activity<br>(WSA)  | Savin<br>gs | (%)               | Capture<br>power | Scanning<br>power | bits/s)           | (GHz)         | ad<br>(µm²)    |
| P.Wohl<br>et al. | Barry et al.<br>[17]  | 16025            | 48715                | 42.5        | 70.61             | 350.51           | 3475.9            | 46.12             | 1.25          | 47112.2        |
| [21]<br>BIST     | Xiang et al.<br>[18]  | 6410             | 23957                | 36.5        | 86.87             | 175.07           | 1773.57           | 76.03             | 1.9           | 39260.7        |
|                  | Prasad et<br>al. [16] | 10625            | 28659                | 38.5        | 83.72             | 210.09           | 2085.56           | 56.8              | 2.0           | 78520.1        |
|                  | Hwasoo et<br>al. [19] | 8012             | 19162                | 39          | 79.33             | 140.01           | 1309.7            | 80.12             | 1.9           | 31408          |
|                  | Proposed<br>TPG       | 3205             | 9563                 | 23.5        | 98.81             | 70.03            | 695.1             | 89.9              | 2.3           | 15704.3        |
| Elham<br>et al.  | Barry et al.<br>[17]  | 14360            | 45023                | 56.5        | 70.49             | 364.5            | 2267.01           | 42.3              | 1.25          | 37869.2        |
| [22]<br>BIST     | Xiang et al.<br>[18]  | 7180             | 26201                | 42          | 84.16             | 173.52           | 1033.28           | 73.26             | 1.9           | 31557.5        |
|                  | Prasad et<br>al. [16] | 8616             | 23592                | 46.5        | 82.88             | 270.36           | 1370.63           | 49.9              | 2.0           | 63115.9        |
|                  | Hwasoo et<br>al. [19] | 5744             | 17822                | 32.5        | 78.16             | 138.7            | 970.412           | 79.87             | 1.9           | 25246          |
|                  | Proposed<br>TPG       | 3072             | 8641                 | 25.5        | 97.35             | 68.21            | 453.52            | 84.69             | 2.3           | 12623.7        |

coverages in the overall system. The percentage of the fault coverage includes the random-pattern delay fault. The experimental results show that significantly better fault coverages are achieved for all the scan chains. The memory cell [23] achieves 100% fault coverage with a power consumption of 0.002 mW since it uses a limited number of transistors in its design. The CUT of the multiplier design achieves nearly 99%. The results show that the proposed weighted TPG achieves better fault coverages and reduced area overhead in the respective scan chains.

The proposed TPG for the test-per-scan method is processed as follows: in the first *i*-th clock cycle, the first set of generated weighted patterns is applied to the scan chains if the scan chains are set to the capture phase. The first set of weights  $w_0, w_1, w_2, w_3, w_4, w_5 \in \{0, 0.55, 0.65, 0.75, 0.85, 1\}$ are assigned to the various scan chains  $S_5, S_0, S_1, S_2, S_4$ , and  $S_3$  as per the design strategies. Otherwise, the scan chains are assigned to the scanning phase, where the scan chains are tested for their faults. At the next clock, the (i + j)-th, the second set of weighted patterns is applied to the scan chains, and the above process continues until all the clock cycles have been run again. Each iteration of the weighted pseudorandom patterns in the BIST architecture assumes that the testing method sets 200 to 2,000 clock cycles to test the scan chains. The testing cycle of each scan chain consists of the capture phase and the scanning phase. During the scanning phase, the number of weighted patterns is shifted by the phase shifter, which shifts the weighted patterns to the scan chain CUT to be tested.

The design of the BIST testability architecture is analyzed and listed using various TPG designs in Table 4. A BIST [21] consists of a TPG, XOR compressor, and MISR signature analyzer with auxiliary blocks such as the eliminator of the unknown source (X), the dynamic X tolerance, and the lowpower sequencer logic. The BIST [22] displays observations with components such as a TPG, test response compactor, and MISR signature analyzer in conjunction with embedded test points. It typically enables 10%-15% reduction in the hardware. For fair comparison results of the proposed TPG, the existing TPGs are also simulated using the gating techniques



FIGURE 6. Fault coverage across various clock cycles in the (a) P. Wohl et al. [21] BIST and (b) Elham et al. [22] BIST.

and implemented in the two BIST designs. Because the typical primary input gates are clustered in the proposed design using simple logic gates, the gate counts are reduced. The XOR gate compressor and the X masking technique in [21] are designed to compact the scan chain output without degradation and eliminate the unknown source bits included during shift cycles. The MISR is later used to identify the faults in the scan chain output. In [22], the XOR compressor was designed using the Mux design and analyzed in the MISR. The TPG used the phase shifter to shift the weighted patterns to the scan chains synchronously.

The proposed TPG assigns separate weights for the respective scan chains using the weight enable signal. The WSA can be approximated in the circuit design using the following equation:

$$WSA = \sum_{out=1}^{allgates} f_{out}(g_{out}(V))$$
(6)

Here, in equation (6),  $f_{out}$  is the gate out fanout for the applied voltage V for all the specified gate outputs used in the circuit design. The WSA savings are the percentages of WSA occurrence in the designs. The proposed TPG attains 23.5% and 25.5% in BIST compared with ref. [21] and ref. [22], respectively. The savings in the switching activity are reduced by more than 10% compared with TPGs [16], [18], [19], and 13% more compared with TPGs [17]. The percentage of WSA savings achieves 10-15% reduction compared with the existing authors.

Figures 6(a) and 6(b) present the fault coverage comparison of the proposed TPG with the existing TPGs applied in BISTs [21] and [22], respectively. Based on the probabilistic scan chain testability, the fault coverage is estimated with transition delay faults. It can be calculated by inserting the test points in the TTL circuit design [24]. The test requires one launch pattern in addition to the initialization pattern to analyze the response of the CUT, and it is captured at the scanin phase. For example, during the first scanning phase, scan chains  $S_5$ ,  $S_0$ ,  $S_1$ ,  $S_2$ ,  $S_4$ , and  $S_3$  are selected. Initially, at the first clock, the logic '0' is loaded into the scan cell  $S_0$ .

In contrast, logic '1' is loaded into scan cell  $S_5$ . If the value of scan cell  $S_5$  results in logic '1', then the first functional clock pulse will cause scan chain  $S_0$  to capture logic '1' at its input. The logic '0' to '1' transitions will be propagated towards the next scan chain  $S_1$ . Then, the captured value is unloaded and shifted out for fault verification. If logic '1' is captured at  $S_1$ , then the transition delay does not occur in the desired time between the scanning and capture phases. If logic '0' is captured, then the random pattern delay fault occurs in the circuit. The design's total faults are measured in terms of fault coverage and indicated as a percentage value.

The proposed TPG includes a smaller number of components with a Galois adder and a Galois multiplier to achieve its weighted pattern compared with the other TPG methods. Hence, it requires fewer transition delay faults. More than 20% of coverage can be achieved compared with the existing TPGs [16]–[19]. TPG [18] acquires better fault coverage of 86.87% in the P. Wohl *et al.* BIST [21] than 84.16% in the E. Moghaddam *et al.* BIST [22]. The fault coverage between Prasad *et al.* [16] and Xiang *et al.* [18] is less than 5%, whereas greater than 10% improvement is achieved compared with the proposed TPG.

The throughput of the proposed method achieves 89.9% compared with P. Wohl *et al.* [21] and 84.69% with Elham *et al.* [22]. The effective test-per-scan testability in BIST [21] is possible due to auxiliary blocks. The captured power values of 70.03 mW in [21] and 68.21 mW in [22] during the capture phase, and the scanning power values of 695.1 mW in [21] and 453.521 mW in [22] for the proposed method during the scanning phase of the BIST are critically reduced. The area overhead for the proposed TPG

in the architecture is lower with a high operating frequency of 2.3 GHz and a temperature of 27° Celsius. It is more evident that the proposed TPG achieves better performance than the existing TPGs.

The TPG can obtain better fault coverage of more than 95% in both BIST designs compared with existing weighted TPGs. The coverage difference is 10% greater than that of TPG [17] and 5% greater than that of TPGs [16] and [18] when the clock cycle exceeds 1,800. The proposed TPG can achieve better fault coverage in all clock cycles from 200 to 2,000 shifts. This is because only 30% of the scan chains are activated during the pseudorandom testing phase. The size of the TPG for the scan chains is allocated as a 32-bit weighted pattern. For clock cycles from 300 to 2,000, the 30% activated scan chains at 200 cycles.

Since [5] is the standard TPG, it cannot be implemented by the most recent BIST models, taking into account the higher delay and lower throughput. Additionally, the weighted patterns cannot be generated by [5]; it is not well suited for the BIST [21], [22] designs. However, the conventional TPGs are considered due to their fair comparison of physical factors in section V with the proposed weighted TPG.

## V. OPERATION OF PROPOSED WEIGHTED PSEUDORANDOM TPG WITH SIMULATION TEST BENCH SETUP

The proposed weighted TPG simulation results are carried out using the SilTerra 0.13  $\mu$ m process on a Mentor Graphics IC design platform. The cumulative degradation in the bit patterns may lead the circuit to a faulty output. Hence, the input seed bits are directly incorporated into the shift registers with the input capacitance effect. The capacitances that are included in the input vectors avoid degradation in pseudoprimary seeds. Additionally, the TPG transistor design outputs are loaded with buffers to ensure that they should be satisfied with the proper load conditions. The shift register is a cascaded configuration and is loaded with buffers, considering that it should be weak enough to maintain its restoration. The swings from the  $V_{LH}$  to  $V_{HL}$  and  $V_{HL}$  to  $V_{LH}$  are retained using the weak buffer designs at the output drain. The transistors are also chosen to be operated in the linear region or the saturation region to maintain their high operating frequencies. Hence, the circuit operates faster than the existing designs. In addition, to obtain better output voltage performances, the dynamic current operates only through the ON branches and maintains small resistance values. The proposed design simulation results are compared with the other potential TPG designs reported in [5], [16]–[19] with respect to accuracy.

Figure 7 represents the timing diagram of the different existing TPGs. The output voltage  $Y_w$  in the existing TPGs is subjected to the problem of voltage degradation. The voltage degradation in TPG [17] and TPG [18] is larger than that in TPG [16] and [19]. In [17], degradation occurs due to redundant test vector detection. Although TPG [18] eliminates TPG [17] redundancy, it experiences larger circuit

29374

components for weighted pattern generation. However, in [19], only one clock input is given to activate the weighted pattern; it exhibits a larger critical path delay from the input to the output. Hence, the output voltage degrades in the swings of low-to-high and high-to-low transitions compared with TPG [16]. Consequently, the delay in the output  $(Y_w)$  is better in TPG [16], as analyzed with respect to TPGs [17]–[19].

Figure 8 denotes the timing diagram for the proposed TPG and its sweep distribution of voltage versus temperature. The chip voltage is maintained from 1.2 to 2.8 V for temperature analysis. This shows that the chip can operate well without discrepancies, as intended for higher temperature factors. In the practical timing analysis of the proposed TPG, an input supply voltage of 1.5 V is applied with a high frequency of 2.3 GHz. The cascaded connection of the registered design in the proposed method provides the proper input signal for weight generation. Due to the asynchronous clock design of the proposed method, the individual clock in the D flip-flops effectively shifts the bit-by-bit seed patterns to an output voltage. The problem that occurs in the delay due to its internal switching activity in the existing TPGs [16]-[19] is reduced in the proposed TPG. Additionally, the redundancy of the input test vectors computed twice during the weighted pattern generation is eliminated. Eventually, the proposed weighted pseudorandom output voltage  $V(Y_w)$  generates weighted patterns continuously with less degradation.

The performance factors, such as gate counts, power consumption, delay, throughput, latency, and area of the TPG, are compared with the various existing TPGs tabulated in Table 5. The gate counts are calculated as the number of transistors used by the logic gate technique. The gate counts are reduced to 144 counts in the proposed design compared with the existing TPG counts of 179, 258, 250, 236, and 156 for [5], [16]-[18], , and [19], respectively. The most common metrics of a good chip are the power consumption and the efficiency of the circuit. The throughput is measured in Gbits/s, which is calculated using the property of inversion in the given clock period or the internal critical path delay. The power-delay product (PDP) and the areadelay product (ADP) have been minimized to enhance the power, area, and delay. The PDP is the energy consumption that is required by the TPG, whereas the ADP denotes the hardware consumption per test cycle in the ASIC designs. This determines the designed circuit energy efficiency and overhead efficiency per chip. The circuit performed well for the proposed TPG method as per the results. Additionally, the power efficiency of 98.2% is calculated using successive power values for the sweep voltage distribution of the TPG. The TPG power consumption of 26.7 nW is optimized from the existing designs [5], [16]-[19] due to the reduced capacitance values, the timing criterion, and the low-transition activity. The power degeneration across the circuit design is rectified using the asynchronous clock design and the buffer scan chain. This scan chain is used to force the pseudoprimary inputs to the pseudorandom outputs. By reducing the supply voltage below 1.2 V, the delay increases, whereas



FIGURE 7. A timing diagram of the existing weighted pseudorandom TPG: (a) W. Johnson *et al.* [16], (b) D. Xiang *et al.* [15], (c) Dhar *et al.* [14], and (d) Hwasoo Shin *et al.* [17].

| TABLE 5. | Physical factors of | different TPG de | esigns with the b | it of length '3 |
|----------|---------------------|------------------|-------------------|-----------------|
| IABLE 5. | Physical factors of | different IPG de | esigns with the D | it of length    |

| Methods            | Gat<br>e<br>cou | Volt<br>age<br>(V) | Freq<br>uency<br>(G | Power<br>Consum<br>ption | Critical<br>path<br>delay | Throu<br>ghput<br>(G | Latency<br>(ns) | PDP<br>(fJ) | ADP<br>(µm²-<br>ns) | Power<br>efficiency<br>(%) | Area<br>(μm²) |
|--------------------|-----------------|--------------------|---------------------|--------------------------|---------------------------|----------------------|-----------------|-------------|---------------------|----------------------------|---------------|
|                    | nts             |                    | Hz)                 | (nW)                     | (ns)                      | bits/s)              |                 |             |                     |                            |               |
| Jinyi et al. [5]   | 179             | 1.5                | 1.9                 | 32.18                    | 83.2                      | 19.6                 | 0.12            | 2677.32     | 53833.7             | 97.9                       | 647.04        |
| Barry et al. [17]  | 258             | 1.5                | 1.9                 | 70.37                    | 68.5                      | 46.59                | 0.17            | 3450.3      | 63711.8             | 90.23                      | 930.1         |
| Xiang et al. [18]  | 250             | 1.5                | 1.9                 | 52.14                    | 72.1                      | 23.86                | 0.09            | 3759.29     | 66297.3             | 90.4                       | 919.52        |
| Prasad et al. [16] | 236             | 1.5                | 1.9                 | 44.22                    | 25.15                     | 39.76                | 0.07            | 1112.13     | 25590.1             | 92.5                       | 1017.5        |
| Hwasoo et al. [19] | 156             | 1.5                | 1.9                 | 38.63                    | 53.2                      | 28.79                | 0.08            | 2055.11     | 38697.6             | 96.2                       | 727.4         |
| Proposed TPG       | 144             | 1.5                | 2.3                 | 26.7                     | 11.45                     | 87.3                 | 0.05            | 305.71      | 7247.9              | 98.2                       | 633.01        |

the energy consumption decreases quadratically [25]. Hence, the respective supply voltages of 1.5 V provide the various critical paths. The proposed TPG operates at a frequency of 2.3 GHz with a throughput of 87.3 Gbits/s without power efficiency degradation. The digital circuit speed is one of the essential factors in deep submicron technology and multi



FIGURE 8. A timing diagram of the proposed weighted pseudorandom TPG (a) and (b) the sweep voltage vs. temperature.

| TABLE 6. | Summary of | the bit-wise per | ormances of the propos | sed weighted TPG alon | ig with the existing TF | 'Gs [5], [16]–[19]. |
|----------|------------|------------------|------------------------|-----------------------|-------------------------|---------------------|
|----------|------------|------------------|------------------------|-----------------------|-------------------------|---------------------|

| Bitw<br>perfori | vise<br>mance | Jinyi et al.<br>[5] | Imp<br>rove | Barry et<br>al. [17] | Imp<br>rove | <b>Xiang et</b><br><b>al.</b> [18] | Imp<br>rove | Prasad et<br>al. [16] | Imp<br>rove | Hwasoo et<br>al. [19] | Imp<br>rove | Proposed<br>TPG |
|-----------------|---------------|---------------------|-------------|----------------------|-------------|------------------------------------|-------------|-----------------------|-------------|-----------------------|-------------|-----------------|
| fact            | ors           |                     | men         |                      | men         |                                    | men         |                       | men         |                       | men         |                 |
|                 |               |                     | (%)         |                      | ι<br>(%)    |                                    | ι<br>(%)    |                       | (%)         |                       | ι<br>(%)    |                 |
| Power           | 4-bit         | 53.54 nW            | 4.04        | 82.61 nW             | 20.5        | 70.84 nW                           | 17.6        | 59.5 nW               | 15.6        | 56.12 nW              | 9.05        | 51.46 nW        |
| Consu           | 8-bit         | 77.41 nW            | 7.06        | 131.9 nW             | 42.4        | 122.6 nW                           | 39.5        | 183.1 nW              | 28.1        | 103.8 nW              | 15.6        | 72.3 nW         |
| mption          | 16-bit        | 128.9 nW            | 13.5        | 321.8 nW             | 55.1        | 466.1 nW                           | 41.2        | 291.7 nW              | 31.6        | 245.0 nW              | 23          | 113.5 nW        |
|                 | 32-bit        | 1.1 μW              | 44.2        | 2.88 μW              | 63.2        | 2.03 μW                            | 57.4        | 0.435 μ W             | 43.8        | 0.318 µW              | 32.4        | 0.215 μW        |
| Delay           | 4-bit         | 85.25 ns            | 36.8        | 79.36 ns             | 26.9        | 72.5 ns                            | 23.1        | 56.32 ns              | 20.3        | 51.03 ns              | 19.6        | 18.2 ns         |
|                 | 8-bit         | 121.2 ns            | 40.9        | 103.36 ns            | 36.8        | 111.32 ns                          | 39.4        | 94.52 ns              | 37.6        | 72 ns                 | 27.2        | 23.8 ns         |
|                 | 16-bit        | 232.23 ns           | 48.9        | 215.3 ns             | 39.2        | 186.8 ns                           | 36.3        | 159.02 ns             | 33.9        | 116 ns                | 31.5        | 59.6 ns         |
|                 | 32-bit        | 0.836 µs            | 59.1        | 0.788 µs             | 46.8        | 0.751 μs                           | 42.5        | 0.689 µs              | 40.6        | 0.531 μs              | 38.2        | 0.170 µs        |
| Area            | 4-bit         | 763.6               | 12          | 1061.4               | 16.2        | 1497.4                             | 43.6        | 1290.2                | 39.4        | 1285.7                | 33.6        | 748.4           |
| (µm²)           | 8-bit         | 1558.5              | 18.5        | 4921.1               | 43.2        | 2969.2                             | 49.1        | 1914.8                | 41.6        | 2315.7                | 39.1        | 924.6           |
|                 | 16-bit        | 3104.6              | 15.8        | 9867.2               | 62.3        | 7752.8                             | 58.1        | 3776.04               | 46.3        | 4538.35               | 40.7        | 2143.69         |
|                 | 32-bit        | 6129.2              | 19.9        | 13470.25             | 73.1        | 11469.92                           | 61.2        | 7521.36               | 51          | 9214.2                | 42.2        | 4263            |

gigahertz IC design. It directly depends on the critical path delay [26]. Hence, the proposed TPG achieved a shorter delay of 11.45 ns by reducing the transistor sizes from input to output, and by reducing the path length. The transistor sizes in the design are 130 nm  $\times$  200 nm for the n-type and 130 nm  $\times$  400 nm for the p-type.

The cascaded design of the proposed TPG bitwise circuits is integrated along with the weak buffers, path ON resistances, and parasitic capacitances in sequential order to reduce the propagation delay. Table 6 interprets the performance factors such as power, delay, and area for the higher bits of the proposed TPG with the [5], [16]–[19] TPGs. It clearly shows that the delay in the proposed TPG is decreased by 50% compared with other TPGs due to the reduced number of combinational circuits in the design. Power consumption conservation far superior to 10% can be achieved: 51.46 nW is consumed for 4 bits, with 0.215  $\mu$ W consumed for 32 bits. Although the conventional TPG [5] achieves good performance in terms of area overhead and power consumption compared with other TPGs, it cannot maintain the circuit delay. The delay in existing TPGs is due to high transition activity. The area occupied by [16] is relatively high compared with TPGs [17] and [19]. Unlike the current TPG methods [16]–[19], whose hardware complexities are greater than that of the conventional TPG [5], TPG [5] failed to produce efficient weighted patterns. Considerably, the weighted TPG is proposed to reduce the gate counts by applying the Galois operation linear property, according

## IEEE Access



FIGURE 9. Monte Carlo simulation results for the voltage distribution of the proposed weighted TPG.



FIGURE 10. (a) Power consumption and (b) current consumption versus voltage sweep for the TPGs.

to (4) and (5). The 32-bit proposed TPG occupies an area of 4263  $\mu$ m<sup>2</sup>, which is approximately 51% less than that of TPG [16]. Additionally, it conserves 42.2% and 19.9% of the area compared with [19] and [5], respectively.

A Monte Carlo analysis of the proposed TPG is performed to identify the random distribution of weighted pseudorandom patterns. This guarantees the framework of uniform pseudorandom patterns generated continuously for testing the scan chains. This method also assures the repeatability of pseudorandom pattern achievement in all iterations [27]. Hence, the repeatable random pattern in this paper is achieved for 100 iterations, which are shown in Figure 9. It can be seen that for the various voltage sweeps, the proposed weighted TPG can run repeatedly. The histogram and Gaussian distribution are denoted with iterations of 50 to 700 cycles per run as  $Y_1$  and the frequency range of 1 to 2.0 GHz as  $Y_2$  in the graph, with 4,230 samples of pseudo-primary input. The gate transition delays are represented by the Monte Carlo analysis, Gaussian distributions, and histograms with probabilistic values. For example, at the probability value of 51.72%, the delay values range from 2.0108 to 2.44213 in the Gaussian distribution.

The simulation results of the power and current consumption concerning the voltage distribution are shown in Figures 10(a) and 10(b), respectively. The power consumption incorporates a gradual increase in terms of the chip voltage from 1.2 V to 2.2 V. Since the circuit design uses traditional CMOS transistors, it has the advantage of low power consumption. The existing methods involving the Galois adder and Galois multiplier for each tapping bit of the TPG consume more scan-in test power and current than the proposed method. Hence, area-efficient design is also achieved compared with the existing methods. In large-scale design fabrication, more power consumption occurs due to the extensive switching activity [23]. The proposed TPG is designed using the CMOS gate technique with low-leakage D flip-flops and low transition XOR gates, due to their lower power consumption. The power leakage is reduced in each stage by combining the transistor gates with the dynamic current-mode logic principle. The dynamic current mode leakage depends on the simultaneous ON times of the p-type and n-type transistors and their sizes. Their source amplitude to drain current is related to the drain diffusion area and the leakage current factor [19]. Hence, the weak buffer design is used for the swing restoration between the D flip-flops and XOR gates. It also suppresses the dynamic leakage current.

The proposed TPG consumes power and current in terms of nano units. Although the conventional TPG [5] consumed less power and current compared with TPGs [16]– [19], it cannot achieve the weighted patterns. The proposed TPG dissipates a power of 195.5 nW, and the current consumption is 32.69 nA for a chip voltage of 2.2 V. However, Johnson *et al.* [17] dissipated power of 600 nW and current of 83.4 nA, which is high compared with the rest of the TPGs.

#### **VI. CONCLUSION**

A new low-power weighted TPG is proposed for generating effective weighted patterns. The subset of the initial pseudo-primary seed bits is calculated mathematically using the Galois operation, which achieves maximum-length weighted patterns with low transition. The weighted Mux acts as a phase shifter in the design and is used for limiting the switching transitions during the test-per-scan method. The proposed weighted TPG is enhanced to implement a 32-bit TPG, guaranteeing low power consumption on all clock cycles with lower area overhead. Correspondingly, the PDP for the proposed design achieves an approximately 43.3% reduction compared with the existing TPGs. Experimental results show that the performance criteria for the proposed TPG are better compared with the various TPGs. The work is implemented in two different BIST architectures to demonstrate WSA savings and higher fault coverage. The WSA savings achieved are 23.5% considering P. Wohl *et al.* [21] and 25.5% considering Elham *et al.* [22]. The fault coverage is mainly enveloped for transition delay faults. Furthermore, this work can be extended to realize sequential faults, redundant faults, scanning and capture phase transition faults, and stuck-open faults. This approach can also be implemented in the multiple scan-forest architectures of the BIST.

#### REFERENCES

- B. Lfsr, S. O. A. Novel, A. S. Abu-Issa, and S. F. Quigley, "Short papers," vol. 28, no. 5, pp. 755–759, 2009.
- [2] I. Pomeranz, "Computing seeds for LFSR-based test generation from nontest cubes," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 6, pp. 2392–2396, Jun. 2016, doi: 10.1109/TVLSI.2015.2496190.
- [3] A. Sasi, A. Amirsoleimani, A. Ahmadi, and M. Ahmadi, "Hybrid memristor-CMOS based linear feedback shift register design," in *Proc.* 24th IEEE Int. Conf. Electron., Circuits Syst. (ICECS), Dec. 2017, pp. 62–65, doi: 10.1109/ICECS.2017.8292094.
- [4] G. N. Balaji and S. C. Pandian, "Design of test pattern generator (TPG) by an optimized low power design for testability (DFT) for scan BIST circuits using transmission gates," *Cluster Comput.*, vol. 22, no. S6, pp. 15231–15244, Nov. 2019, doi: 10.1007/s10586-018-2552-x.
- [5] J. Zhang, Q. Zhang, and J. Li, "A novel TPG method for reducing BIST test-vector size," in *Proc. Int. Symp. High Density Design Packag. Microsyst. Integr.*, no. 149, Jun. 2007, pp. 6–9, doi: 10.1109/HDP.2007. 4283639.
- [6] S. Hellebrand, S. Tarnick, J. Rajski, B. Courtois, and T. I. M. Imag, "Multiple-polynomial linear feedback shift registers," 1992, pp. 120–129.
- [7] X. Lin and J. Rajski, "Adaptive low shift power test pattern generator for logic BIST," in *Proc. Asian Test Symp.*, 2010, pp. 355–360, doi: 10.1109/ATS.2010.67.
- [8] A. S. Abu-Issa, "Energy-efficient scheme for multiple scan-chains BIST using weight-based segmentation," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 65, no. 3, pp. 361–365, Mar. 2018, doi: 10.1109/TCSII.2016. 2617160.
- [9] G. S. Sankari and M. Maheswari, "Energy efficientweighted test pattern generator based bist architecture," in *Proc. Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud), I-SMAC*, 2019, pp. 448–453, doi: 10.1109/ I-SMAC.2018.8653768.
- [10] R. Kapur, S. Patil, T. J. Snethen, and T. W. Williams, "A weighted random pattern test generation system," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 15, no. 8, pp. 1020–1025, Aug. 1996, doi: 10.1109/43.511581.
- [11] A. Jas, C. V. Krishna, and N. A. Touba, "Weighted pseudorandom hybrid BIST," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 12, pp. 1277–1283, Dec. 2004, doi: 10.1109/TVLSI.2004.837985.
- [12] D. Xiang, M. Chen, and H. Fujiwara, "Using weighted scan enable signals to improve test effectiveness of scan-based BIST," *IEEE Trans. Comput.*, vol. 56, no. 12, pp. 1619–1628, Dec. 2007.
- [13] H.-C. Tsai, K.-T. Cheng, and S. Bhawmik, "On improving test quality of scan-based BIST," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 19, no. 8, pp. 928–938, Aug. 2000, doi: 10.1109/ 43.856978.
- [14] N. A. Touba and E. J. McCluskey, "Altering a pseudo-random bit sequence for scan-based BIST," in *Proc. IEEE Int. Test Conf.*, Oct. 1996, pp. 167–175, doi: 10.1109/test.1996.556959.
- [15] G. Kiefer, H. Vranken, E. J. Marinissen, and H. J. Wunderlich, "Application of deterministic logic BIST on industrial circuits," *J. Electron. Test. Theory Appl.*, vol. 17, nos. 3–4, pp. 351–362, 2001, doi: 10.1023/A:1012283800306.
- [16] R. Cited and A. Banerjee, U.S. Patent, 2019, vol. 2.
- [17] B. W. Johnson, J. H. Aylor, and H. H. Hana, "Efficient use of time and hardware redundancy for concurrent error detection in a 32-bit VLSI adder," *Comput. Arith.*, vol. 23, no. 1, pp. 171–178, 2015, doi: 10.1142/9789814641470.

- [18] D. Xiang, X. Wen, and L.-T. Wang, "Low-power scan-based built-in selftest based on weighted pseudorandom test pattern generation and reseeding," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 3, pp. 942–953, Mar. 2017, doi: 10.1109/TVLSI.2016.2606248.
- [19] H. Shin, S. Choi, J. Park, B. Y. Kong, and H. Yoo, "Area-efficient error detection structure for linear feedback shift register," *Electronics*, vol. 9, no. 1, pp. 1–10, 2020.
- [20] H. Dau, I. M. Duursma, H. M. Kiah, and O. Milenkovic, "Repairing Reed–Solomon codes with multiple erasures," *IEEE Trans. Inf. The*ory, vol. 64, no. 10, pp. 6567–6582, Oct. 2018, doi: 10.1109/TIT.2018. 2827942.
- [21] P. Wohl, J. A. Waicukauski, G. A. Maston, and J. E. Colburn, "XLBIST: X-tolerant logic BIST," in *Proc. IEEE Int. Test Conf. (ITC)*, Oct. 2018, pp. 1–9, doi: 10.1109/TEST.2018.8624738.
- [22] E. Moghaddam, N. Mukherjee, J. Rajski, J. Solecki, J. Tyszer, and J. Zawada, "Logic BIST with capture-per-clock hybrid test points," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 38, no. 6, pp. 1028–1041, Jun. 2019, doi: 10.1109/TCAD.2018.2834441.
- [23] H. Tran, "Demonstration of 5T SRAM and 6T dual-port RAM cell arrays," in *IEEE Symp. VLSI Circuits, Dig. Tech. Paper*, Jun. 1996, pp. 68–69, doi: 10.1109/vlsic.1996.507719.
- [24] S. Roy, B. Stiene, S. K. Millican, and V. D. Agrawal, "Improved random pattern delay fault coverage using inversion test points," in *Proc. IEEE* 28th North Atlantic Test Workshop (NATW), May 2019, pp. 1–6, doi: 10.1109/NATW.2019.8758727.
- [25] C. Senthilpari, K. Diwakar, and A. Singh, "Low energy, low latency and high speed array divider circuit using a Shannon theorem based adder cell," *Recent Patents Nanotechnol.*, vol. 3, no. 1, pp. 61–72, Jan. 2009, doi: 10.2174/187221009787003311.
- [26] C. Senthilpari, A. K. Singh, and K. Diwakar, "Design of a low-power, high performance, 8×8 bit multiplier using a Shannon-based adder cell," *Microelectron. J.*, vol. 39, no. 5, pp. 812–821, May 2008, doi: 10.1016/j.mejo.2007.12.016.
- [27] R. F. W. Coates, G. J. Janacek, and K. V. Lever, "Monte Carlo simulation and random number generation," *IEEE J. Sel. Areas Commun.*, vol. SAC-6, no. 1, pp. 58–66, Jan. 1988, doi: 10.1109/49.192730.



VISHNUPRIYA SHIVAKUMAR received the bachelor's degree in electronics and communication engineering (ECE) and the master's degree in VLSI design from Anna University, Chennai, India, in 2009 and 2014, respectively. She is currently pursuing the Ph.D. degree in VLSI with Multimedia University, Cyberjaya, Malaysia. She has been working as an Assistant Professor in ECE with the Prince Dr. K. Vasudevan College of Engineering and Technology, Chennai. Her

research interests include very large-scale integration, microelectronics, analog/mixed-signal processing, and semiconductor technology.



**CHINNAIYAN SENTHILPARI** (Senior Member, IEEE) received the M.Sc. degree in applied electronics and the M.E. degree in material science from the National Institute of Technology, Tiruchirappalli, and the Ph.D. degree from Multimedia University, in 2009. He joined Multimedia University, Malaysia, as a Lecturer, in 2005. His research work focuses on the VLSI design and simulation analysis of new hardware circuits. He was invited as a committee member of various international

conferences. His research interests are in the chosen area of specialization. He is currently guiding the master's and Ph.D. students while teaching at the undergraduate and postgraduate levels. He successfully researched VLSI design with 150 bachelor's students, one master's student, and four Ph.D. students. He is a member of IET, U.K. He received recognition from IET as a Chartered Engineer.



**ZUBAIDA YUSOFF** received the B.Sc. degree in electrical and computer engineering (cum laude) and the M.Sc. degree in electrical engineering from The Ohio State University, USA, in 2000 and 2002, respectively, and the Ph.D. degree Cardiff University, Wales, U.K., in 2012. She worked with Telekom Malaysia International Network Operation, in 2002. She joined Multimedia University, Malaysia, as a Lecturer, in 2004. She continued her studies at Cardiff University, in 2008. She

currently works as a Senior Lecturer with Multimedia University. Her current research interests include analog/mixed-signal circuit design and microwave/mm-wave power amplifier systems. She is also active in professional societies such as IEEE. She is also the Secretary of the IEEE Electron Device Society, in 2019, and the IEEE Regional Symposium of Microelectronics and Nanoelectronics (IEEE-RSM 2019).

. . .