Received August 22, 2019, accepted September 11, 2019, date of publication September 18, 2019, date of current version October 3, 2019. Digital Object Identifier 10.1109/ACCESS.2019.2942150 # **Combination-Encoding Content-Addressable Memory With High Content Density** GUHYUN KIM<sup>1,2</sup>, VLADIMIR KORNIJCUK<sup>3</sup>, JEESON KIM<sup>4</sup>, DOHUN KIM<sup>1,2</sup>, CHEOL SEONG HWANG<sup>1,2</sup>, AND DOO SEOK JEONG<sup>10</sup>, (Member, IEEE) Department of Material Science and Engineering, Seoul National University, Seoul 08826, South Korea Corresponding author: Doo Seok Jeong (dooseokj@hanyang.ac.kr) This work was supported by the research grant of the National Research Foundation of Korea under Grant NRF-2019R1C1C1009810. **ABSTRACT** Recently, resistance switch-based content-addressable memory (RCAM) has been proposed as an alternative to the mainstream static random-access memory-based CAM because of its high integration potential and low static energy consumption. However, RCAM has a lower data density due to the use of a pair of resistance switches for a single bit of contents (0.5 bit/switch) than resistive random access memory (1 bit/switch). In this paper, we propose a new type of RCAM referred to as combination-encoding CAM (CECAM). In the N-CECAM, a single unit consists of N high and N low resistance state switches whose combination collectively represents binary contents, yielding a data density of approximately 0.85 bit/switch when N = 10, for instance. The key to the CECAM is the encoding of an n-bit search key as a 2N-digit key and its decoding. To this end, we propose a simple algorithm for encoding and decoding and its implementation in digital circuitry. **INDEX TERMS** Content-addressable memory, combination-encoding content-addressable memory, resistance switch, crossbar array, content density. # I. INTRODUCTION Content-addressable memory (CAM) is a type of memory accessed based on contents instead of memory addresses as opposed to random access memory (RAM) [1]. Upon receiving an input data word to search (search key), CAM simultaneously searches all memory entries for search-keyrelevant contents in one clock cycle and returns the addresses of the contents. Therefore, CAM has a significant advantage over RAM in searching speed. Its main application domains include lookup tables (LUTs) in network routers [2]-[7]. The network router decides the forwarding direction of a data packet between networks. The LUT in the network router including the hierarchical addresses is searched for the best route or port for the data packet to be forwarded. Additionally, the CAM storing the LUT is a critical component for digital communications among neurons in neuromorphic hardware [8]. The LUT including the topology of a neural network is searched for the postsynaptic neuron addresses The associate editor coordinating the review of this manuscript and approving it for publication was Yue Zhang. upon the occurrence of an event from a presynaptic neuron. A fast search of the CAM significantly accelerates event routing processes, enabling real-time inference and learning. It also applies to vector-quantization [9], decomposing an input image to a set of vectors, and information retrieval [10], finding the information relevant to the desired information from big data. CAM is categorized as binary CAM (BCAM) and ternary CAM (TCAM). As the names indicate, each unit cell in BCAM represents either '0' or '1' whereas that in TCAM has an additional 'don't care' (or 'X') state [1]. For instance, '1X1' in TCAM is matched to search keys '101' and '111' because 'X' matches both '0' and '1'. This high flexibility of TCAM is the key to packet forwarding tasks [2]–[7]. Static RAM (SRAM)-based CAM is the most popular form of CAM [1], [11]. The SRAM-based CAM leverages fast searching speed and high compatibility with well-established complementary metal-oxide-semiconductor (CMOS) technologies. Nevertheless, significant disadvantages are its low areal density due to the use of many transistors ( $\geq 8$ ) to represent a single bit and high static power consumption <sup>&</sup>lt;sup>2</sup>Inter-University Semiconductor Research Center, Seoul National University, Seoul 08826, South Korea <sup>&</sup>lt;sup>3</sup>Center for Electronic Materials, Korea Institute of Science and Technology, Seoul 02792, South Korea <sup>&</sup>lt;sup>4</sup>Division of Materials Science and Engineering, Hanyang University, Seoul 04763, South Korea due to the leakage current of SRAM [12]–[14]. As alternatives to the SRAM-based CAM, CAMs based on emerging non-volatile memories (NVMs) such as phase-change memory [14]–[16], magnetic tunnel junction [14], [17]–[19], ferroelectric memory [20], and resistance switch [21]–[25] have been proposed to date. Such NVM-based CAMs highlight their high data density and zero-static energy consumption due to the non-volatility. They also offer solutions to TCAM by appropriately configuring the non-volatile memory elements [14]–[25]. Among candidates, switch-based the resistance CAM (RCAM) is a front runner; two-transistor two-resistor (2T2R)-based RCAM has been prototyped using a 4 kb resistive RAM (RRAM) [25]. Additionally, RCAM may be realized in a passive crossbar array that highlights its ideal $8F^2$ cell size [21], [23]. Nevertheless, RCAM has a lower content density than RRAM because it uses a pair of resistance switches as a single bit of content, i.e., 0.5 bit/switch. A further increase in data density needs a new content-encoding framework. To this end, we propose a new type of resistance switch-based CAM, named combination-encoding CAM (CECAM). Section II outlines the working principle of the CECAM including a search key-encoding algorithm (Section II.A) and its implementation in a digital circuit (Section II.B). Section III explains parallel searches of multiple CECAM domains to realize TCAM with coarse granularity. Reading contents from the CECAM needs to decode them using an appropriate decoding algorithm because the contents in the CECAM are encoded, which is addressed in Section IV. Finally, Section V highlights the general application of the CECAM scheme to various CAM designs. # II. COMBINATION-ENCODING CONTENT-ADDRESSABLE MEMORY Fig. 1 illustrates a schematic of unit cells of active and passive RCAMs (voltage- and current-reading schemes, respectively). RCAM takes a pair of resistance switches as a single unit of single-bit capacity. Each switch is set to one of the binary states: high resistance state (HRS) and low resistance state (LRS). Specifically, the two switches are complementary; they are in different resistance states. This yields two distinguishable configurations, HRS-LRS (HL) and LRS-HRS (LH), representing one bit of content. For both schemes in Fig. 1, LH corresponds to '0' and HL to '1'. Each bit of a search key is represented by voltage signals on complementary search lines (SL and $\overline{SL}$ ) such that '0' pulls SL low and SL high while '1' pulls SL high and SL low. When a search bit of '0' is applied to a stored content of '0' (LH), the RCAM units in Figs. 1a and 1b notify matching signals on the match lines (black $V_{\rm ML}$ and $I_{\rm ML}$ in Figs. 1c and 1d, respectively). These signals are contrasted with a mismatching case, where a search bit of '1' is applied to the same content '0', resulting in mismatch signals on the match lines indicated by the red $V_{\rm ML}$ and $I_{\rm ML}$ in Figs. 1c and 1d, FIGURE 1. Schematic of the conventional RCAM in (a) active and (b) passive crossbar arrays. ML, SL, SL, and PL denote a match line, search line, complementary search line, and plate line, respectively. A timing diagram for active and passive arrays is illustrated in (c) and (d), respectively. CLK, $V_{\rm SL}$ , $V_{\rm SL}$ , and $V_{\rm ML}$ denote a clock cycle, search line voltage, complementary search line voltage, and match line voltage, respectively. $I_{\rm ML}$ in (d) means the current through the match line. The red lines in (c) and (d) indicate $V_{\rm SL}$ , $V_{\rm SL}$ , and the CAM responses when mismatching. respectively. Thus, searching *N*-bit keys commonly needs 2*N* switches per match line, i.e., 0.5 bit/switch per match line. In contrast, N-CECAM (N = 1, 2, 3, ...) uses a chunk of 2N switches per match line as a single unit of multi-bit capacity, boosting the memory capacity per switch far beyond a content density of 0.5 bit/switch. Its key difference from RCAM is that the *N*-CECAM harnesses the large number of possible combinations of 2N switches to boost the content density in contrast to RCAM using complementary pairs of switches and search lines to store and search a single bit of contents. We regard a passive array of nonvolatile resistance switches with current reading as a model system of the CECAM. This model system leverages its high memory density and fast content reading. Nevertheless, the sneak current disturbing current read-out processes is a critical downside. However, the CECAM concept is fully compatible with other types of CAM including active arrays of resistance switches freer from the sneak current issue, which will be addressed in Section V. The N-CECAM consists of a resistance switch array and a search key encoder as illustrated in Fig. 2(a). In the array, 2N resistance switches are placed at the crossing points between each match (horizontal) line and 2N search (vertical) lines. The search key encoder converts an n-bit search key to a 2N-digit binary key, and each digit is applied to each of the 2N search lines such that '1' and '0' pull the search line high and low, respectively. Assuming m switches are in the HRS and the other (2N-m) switches in the LRS, the minimum current response to a single 2N-digit binary key exists only if the key includes m 1's and (2N-m) 0's, and each of the m 1's in the key is matched to each of the m HRS switches. VOLUME 7, 2019 137621 FIGURE 2. (a) Schematic of 3-CECAM (N=3). A single unit consists of N HRS and N LRS switches. SA and PE mean a sense amplifier and priority encoder, respectively. (b) Current responses to a given encoded key upon a match and mismatches. Matching allows the minimal current response (first row). To maximize the number of possible configurations of 2N switches $\binom{2N}{m}$ , m is set to N. Thus, the search key encoder maps an n-bit search key to a 2N-digit key with N 1's and N 0's in a bijective manner. When matching N 1's in the encoded key to the N HRS switches, the current response through the match line is minimal as shown in Fig. 2(b). Otherwise, some 1's in the encoded key are inevitably associated with LRS switches, and thus the current response through the match line becomes high, indicating a mismatch. Note that the current response scales with the degree of mismatch, i.e., the number of mismatched bits in the encoded key. The worst mismatch regarding a sensing margin is due to two mismatched bits as depicted in Fig. 2(b). The total number of 2N-digit encoded keys is $\binom{2N}{N}$ , and so is the number of 2N switch configurations per match line. Given the use of n-bit search keys, $2^n$ of $\binom{2N}{N}$ configurations are associated with the total n-bit keys, satisfying $2^n \le \binom{2N}{N} < 2^{n+1}$ . This inequality yields $$n = \left\lfloor \log_2 \binom{2N}{N} \right\rfloor \tag{1}$$ FIGURE 3. Content density of *N*-CECAM with *N* in comparison with the conventional RCAM and RRAM. The kinks arise from the floor function in (1). where $\lfloor \cdot \rfloor$ denotes a floor function. Therefore, the content density (content bit per switch) is given as $\log_2 \binom{2N}{N} / 2N$ , which is plotted in Fig. 3. Notably, for N's (>2), the content density exceeds the density of the conventional RCAM designs (0.5 bit/switch) and approaches the density of RRAM (1 bit/switch) asymptotically. ### A. ALGORITHM FOR COMBINATION ENCODING The key to the *N*-CECAM is the bijective mapping of the total *n*-bit search keys to $2^n$ 2N-digit keys $(2^n \le \binom{2N}{N}) \le 2^{n+1}$ ) using an appropriate encoding function. We propose the encoding function $E_N$ for an *n*-bit search key a as follows: **function** $E_N(a)$ **set** b **to** 2N-digit binary number 0 **for** i=0 **to** N-1 **do if** there is c satisfying $\binom{c}{N-i} \leq a < \binom{c+1}{N-i}$ **then set** the (c+1)th digit of b **to** 1 **set** a **to** $a-\binom{c}{N-i}$ **end if end for return** b Note that $E_N$ is bijective when $A_t = \left\{0, 1, \dots, \binom{2N}{N} - 1\right\}$ and $B_t = \{b \mid b: 2N$ -digit binary numbers with of N 1's and N 0's are taken as the domain and codomain of $E_N$ , respectively (**Theorem 1** in Appendix). Therefore, $E_N$ is also a bijective function for domain $A = \{0, 1, \dots, 2^n - 1\}$ ( $\subset A_t$ ) and codomain $B = \{E_N(0), E_N(1), \dots, E_N(2^n - 1)\}$ ( $\subset B_t$ ). Table 1 shows the encodings of 4-bit search keys as 16 distinguishable 6-digit binary numbers with three 1's and three 0's using the encoding function $E_3$ (N=3). The encoded 137622 VOLUME 7, 2019 end function. FIGURE 4. (a) Block diagram of an encoding circuit for 3-CECAM. (b) Timing diagram for encoding a search key of 15 as a 6-digit key of 101100. data are subsequently programmed in 2N switches such that a '1' and '0' in the encoded data are written as 'H' and 'L', respectively. 'H' and 'L' denote HRS and LRS, respectively. The last configuration 'HHHHHHH' indicates 'don't care' for TCAM. A search key is encoded as a 2N-digit key iteratively, which ### B. IMPLEMENTATION OF ENCODING CIRCUIT needs to be implemented in circuitry in a way to reduce a delay in decoding at the cost of memory usage. To this end, for $0 \le c \le 2N$ and $0 \le i < N$ is stored in a memory, which is referred to as a combination table. The LUT P is an $N \times (2N+1)$ matrix whose element A main advantage of employing the combination table is that the comparison $\binom{c}{N-i} \leq a < c$ $\binom{c+1}{N-i}$ in the encoding function $E_{\rm N}$ can be accelerated considerably by retrieving $\binom{c}{N-i}$ and $\binom{c+1}{N-i}$ from the LUT P rather than evaluating them for every comparison. A block diagram of the encoding circuit including the LUT P is shown in Fig. 4(a). When RESET is 1, the encoding circuit receives search key a that is encoded $(a_0)$ . Simultaneously, a 2N-long array b is initialized as $b_0[k] = 0$ for $0 \le k <$ 2N, where $b_0[k]$ denotes the (k+1)th digit of $b_0$ . We note that RESET is synchronized with clock signals (CLK) to avoid a metastability problem as shown in Fig. 4(b). The circuit first addresses the first row of the LUT P, i.e., P[0,:]. The NEXT block finds c in P[0,:], satisfying $P[0,c] \le a_0 <$ P[0,c+1], using parallel comparators, resulting in $c_0$ . $a_1$ is consequently evaluated as $a_1 = a_0 - P[0, c_0]$ . $b_1$ is identical to $b_0$ except its $(c_0+1)$ th digit that is set to one, $b_1[c_0] = 1$ . $c_1$ is subsequently evaluated for the next row of the LUT P, **FIGURE 5.** Encoding delay and number of entries in the LUT P with the bit number of a search key (n). i.e., P[1, :], as for the first row. This evaluation is repeated for all remaining rows, eventually resulting in a 2N-digit encoded key $b = b_N$ . Therefore, the delay in encoding is caused by iteratively addressing each row of the LUT P, which scales with N(Fig. 4(b)). Given the relationship in (1), the encoding delay is associated with the bit number of a search key (n) as plotted in Fig. 5 (blue line); the delay tends to increase with the bit number. To address the memory overhead for the LUT P, the number of its entries was also evaluated with the bit number of a search key and co-plotted in Fig. 5 (red line). #### III. PARALLEL SEARCH OF N-CECAM DOMAINS A search of a single N-CECAM domain can be extended to parallel searches of multiple N-CECAM domains (partitions). Such parallel searches are useful, particularly, when a search key is so lengthy that N becomes large according to (1). To this end, $n_{\rm p}$ partitions are given to each match line, where each partition is a single $N_{\rm p}$ -CECAM domain loaded VOLUME 7, 2019 137623 | Integer | Configuration | Search key | Integer | Configuration | Search key | |---------|---------------|------------|---------|---------------|------------| | 0 | LLLHHH | 000111 | 9 | LHHHLL | 011100 | | 1 | LLHLHH | 001011 | 10 | HLLLHH | 100011 | | 2 | LLHHLH | 001101 | 11 | HLLHLH | 100101 | | 3 | LLHHHL | 001110 | 12 | HLLHHL | 100110 | | 4 | LHLLHH | 010011 | 13 | HLHLLH | 101001 | | 5 | LHLHLH | 010101 | 14 | HLHLHL | 101010 | | 6 | LHLHHL | 010110 | 15 | HLHHLL | 101100 | | 7 | LHHLLH | 011001 | 0 – 15 | НННННН | 000000 | | 8 | LHHLHL | 011010 | | | | **TABLE 1.** Truth table of encodings of 4-bit integers as resistor configurations (N=3). with $2N_{\rm p}$ resistance switches in total. All partitions can share a single LUT P whose component P[i, c] is $\binom{c}{N_{\rm p}-i}$ for $0 \le c \le 2N_{\rm p}$ and $0 \le i < N_{\rm p}$ as in Fig. 6. Therefore, each match line holds n bits expressed as $$n = \left\lfloor \log_2 \left( \frac{2N_{\rm p}}{N_{\rm p}} \right) \right\rfloor \cdot n_{\rm p}. \tag{2}$$ Each of $n_p$ partitions is responsible for each $n/n_p$ bit chunk of the total n-bit search key. Notably, this method reduces content-memory density. For instance, for a 60-bit search key, $n_p = 1, 4, 10, 15,$ and 60 (respectively corresponding to $N_p = 32, 9, 4, 3, \text{ and } 1)$ yields approximately 0.94, 0.83, 0.75, 0.67, and 0.5 bit/switch, respectively. $N_p = 32$ and 1 indicate the single domain CECAM and the conventional 2R-based RCAM, respectively. Despite the reduction in content density, the advantage of the partitioning is threefold: reductions in the encoding delay and memory usage for the LUT P, and granularity of 'don't care' bits. Regarding the first advantage, the encoding delay is proportional to $N_p$ as considered in Section II.B. Therefore, reducing $N_{\rm p}$ results in a reduction in the encoding delay. Regarding the second advantage, the LUT P is an $N_p \times (2N_p+1)$ matrix where the largest component is $\binom{2N_p}{N_p}$ which reaches $1.83 \times 10^{18}$ for $N_p = 1$ , requiring 61-bit memory. Thus, reducing $N_{\rm p}$ (i.e., introducing partitions) reduces memory usage for the LUT P considerably. For TCAM application, a configuration of All $2N_p$ resistance switches in the HRS represents 'don't care' bits in an $N_p$ -CECAM domain. Therefore, the granularity of 'don't care' bits in the $N_p$ -CECAM equals $n/n_p$ bits. As shown in Table 1, 3-CECAM offers a 'don't care' granularity of 4 bits; when all six resistance switches in a domain are set to the HRS, the switch configuration is matched to any of 4-bit search keys between 0 and 15. Setting $N_p = 32$ for 60-bit search keys yields the coarsest granularity (60 bits), unsuitable for TCAM applications whereas $N_p = 1$ , equivalent to the conventional RCAM, yields the finest granularity (1 bit). Therefore, introducing partitions is a viable method to decrease data granularity. Nevertheless, because this comes at the cost of a reduction in content density, the granularity should be reconciled with content density. # IV. ALGORITHM FOR CONTENT DECODING AND CIRCUIT IMPLEMENTATION Contents in the state-of-the-art 2T2R-RCAM illustrated in Fig. 1(a) are read bitwise such that each bit (a pair of resistance switches) is iteratively examined for matching with the same key applied to the complementary search lines (SL and $\overline{\text{SL}}$ ) [14]. Therefore, a delay in reading is proportional to the bit number of contents. An advantage of a passive array of resistance switches shown in Fig. 1(b) over the active array is that the total contents per match line can simultaneously be read by pulling the match line high and simultaneously measuring the current response on all search lines. Either design employing the CECAM should be able to decode the 2N-digit contents as the original n-bit contents by an appropriate decoding function. The decoding function $D_N$ is the reverse of the encoding function $E_N$ , which is implemented as follows: ``` function D_N(b) set a, i, c to 0 while i < N do if [the (c+1)th ext{ digit of } b] = 1 then set i to i+1 set a to a + {c \choose i} end if set c to c+1 end while return a end function. ``` A 2N-digit content in the CECAM is decoded as an *n*-bit key iteratively. The decoding function $D_N$ is implemented in a digital circuit as shown in Fig. 7(a). The circuit first initializes an n-long array $a_0$ to 0. Upon receiving a 2N-digit content b that is decoded ( $b = b_0$ ), the ADR block in Fig. 7(a) searches for the address of the right-most '1' in $b_0$ and returns it, **FIGURE 6.** Schematic of parallel searches of $N_p$ -CECAM partitions. NEXT in the figure means NEXT block in the encoding circuit. The n-bit search key is divided into $n_p$ chunks, and each chunk applies to the NEXT block of each partition. All partitions share a single LUT P. FIGURE 7. (a) Block diagram of a decoding circuit for 3-CECAM. (b) Timing diagram for decoding an encoded search key of 101100 as its original search key (15). which corresponds to $c_0(b_0[c_0] = 1)$ . $P[N-1,c_0] = \binom{c_0}{1}$ is subsequently retrieved from the LUT P, and $a_1$ is evaluated as $a_1 = a_0 + P[N-1, c_0]$ . $b_1$ is identical to $b_0$ other than its right-most '1' switched to '0'. Subsequently, $c_1$ is evaluated as the address of the right-most 1 in $b_1$ , and then $a_2$ and $b_2$ are evaluated as $a_2 = a_1 + P[N-2, c_1]$ and $b_2 = b_1$ except that $b_2[c_1] = 0$ , respectively. This evaluation is repeated N times, resulting in an n-bit decoded content a. Similar to encoding, a delay in decoding is caused by iteratively addressing each row of the LUT P. Therefore, the delay also scales with N(Fig. 7(b)). ### **V. DISCUSSION** The CECAM scheme was applied to a passive array of resistance switches as a model system. This model system allows the ultimate integration density, i.e., when used as RAM, 4F<sup>2</sup> cell size per bit. In this case, a current-sensing scheme is suitable for bitwise reading. However, the reading process is significantly prone to error because of the notorious sneak current issue due to the lack of bit-selection devices [26]. Employing transistors as active selectors significantly keeps the sneak current sufficiently low for reliable reading as for the 2T2R-based RCAM design [27]. The CECAM scheme applies to an active array of resistance switches with a voltage-sensing scheme as shown in Fig. 8. In the array, a single unit consists of 2N transistors and 2N resistance switches. A one-transistor and one-resistor (1T1R) unit is placed at a crossing point between each match (horizontal) line and each of the 2N search (vertical) lines. Specifically, the gates of 2N transistors are wired to the 2N search lines, and thus the encoded 2N-digit key determines the channel conductance of the 2N transistors during a searching period. For all transistors, the source is connected to a common plate line which is grounded during searching. VOLUME 7, 2019 137625 | TABLE 2. | Comparision | with | previous | cam o | lesigns. | |----------|-------------|------|----------|-------|----------| |----------|-------------|------|----------|-------|----------| | | 16T-TCAM<br>[11] | 6T2R-TCAM<br>[18] | 2T2R-TCAM<br>[16] | 2T2R-TCAM<br>[25] | 2R-TCAM<br>[23] | 4-CECAM<br>with<br>reference to<br>[25] | 4-CECAM with reference to [23] | |---------------------|------------------|-------------------|-------------------|-------------------|-----------------|-----------------------------------------|--------------------------------| | Memory type | SRAM | MTJ* | PCM* | RS* | RS* | RS* | RS* | | Reading scheme | Voltage | Voltage | Voltage | Voltage | Current | Voltage | Current | | Technology (nm) | 65 | 90 | 90 | 130 | 90 | 130 | 90 | | Word width (bit) | 72 | 32 | 64 | 128 | 64 | 128 | 64 | | Cell area (µm²/bit) | 1.69 | 10.35 | 0.41 | NA (1×) | NA (1×) | NA (0.67×) | NA (0.67×) | | Supply voltage (V) | 1 | 1.2 | 1.2 | 0.9 | 0.2 | 0.9 | 0.2 | | Search delay | 1.9 ns | 0.29 ns | 1.9 ns | 2 ns | 0.5 ps | $2 \text{ ns} + 4/f_{\text{clk}}$ | $0.5 \text{ ps} + 4/f_{clk}$ | <sup>\*</sup>MTJ, PCM, and RS denote magnetic tunnel junction, phase-change memory, and resistance switch, respectively. FIGURE 8. Schematic of CECAM with a voltage-reading scheme. The blue arrow in the second row illustrates activated pull-down path. When searching, the match lines are pre-charged simultaneously. Then, a 2*N*-digit encoded key is applied to the search lines such that 1's and 0's pull the search lines up and down, respectively. When matching *N* 1's in the encoded key to the transistors paired with the *N* HRS switches, the voltage on the match line remains high because none of the pull-down paths are activated. Otherwise, some 1's in the encoded key are inevitably associated with transistors paired with LRS switches, indicating the activation of pull-down paths (Fig. 8). Therefore, the voltage on the pre-charged match line decays rapidly, which is noticed by a sense amplifier as a mismatch. This search process is identical to the 2T2R-based conventional RCAM [25]. Regarding the sense amplifier design for the CECAM, a current- and a voltage-sensing amplifier are suitable for a passive and active array of switches, respectively, as for RCAM. Compared with RCAM, the CECAM does not impose additional requirements on its sense amplifiers, so that previously developed sensing technologies [16], [28], [29] are compatible with the CECAM. In this regard, the CECAM can make full use of previous RCAM technologies given a subtle difference between the CECAM and RCAM. The subtle difference lies in content- and search key-encoding, which is the key of our present study. Nevertheless, the subtle difference remarkably enhances the content density. The application domain of the proposed CECAM fully covers other resistance-based NVMs with two-terminal switches, e.g., phase-change memory [14], [15] and magnetic tunnel junction [14], [17], in both active and passive arrays. Moreover, the CECAM concept is compatible with three-terminal NVMs, for instance, ferroelectric transistors [20] and NOR Flash memory. Table 2 compares the CECAM with previous CAM designs. The 4-CECAM was considered with reference to the 4 kb 1T1R RRAM prototype [25] and simulation results of a 2R-TCAM [23]. The 128 bit word width in [25] allows 16 × (4-CECAM domain), i.e., $N_p = 4$ and $n_p = 16$ . According to (2), each match line holds 96 bit contents; instead, the conventional RCAM allows 64 bit contents per match line only. Therefore, the cell area per content bit is approximately 0.67 times that of the conventional RCAM as shown in Table 2. The same holds for the 2R-TCAM with 64 bit word width in [23]. Regretfully, the actual size of the 1T1R RRAM prototype [25] is unavailable so that the cell area per bit for the CECAM is expressed as its area relative to that of the RCAM (0.67). The additional delay in searching due to the encoding of a search key is $N_p/f_{clk}$ , where is $f_{clk}$ clock speed. The cell area per bit and search delay aside, the CECAM is identical to the RCAM. # VI. CONCULSION A new type of CAM, referred to as CECAM, was proposed to improve the content density in a memory array. The N-CECAM employs a group of 2N resistance switches as a single memory unit with multi-bit (n-bit) capacity, which enhances its content density far beyond that of the conventional RCAM (0.5 bit/switch). For instance, 10-CECAM (N=10; 20 resistance switches) has 17-bit content capacity (n=17) in contrast to the conventional RCAM that needs 34 resistance switches for 17-bit content capacity. The key to the CECAM is an algorithm for n-to-2N encoding and its decoding. The proposed encoding and decoding algorithms were proven to match n-bit search keys to 2N-digit keys bijectively. Additionally, the algorithms are readily **IEEE** Access implemented in digital circuits with a combination table, which results in an encoding (decoding) delay of N clock cycles for the N-CECAM. The proposed CECAM concept is compatible with various NVM-based CAM designs including active and passive RCAM, other two-terminal resistance switch-based CAM, e.g., phase-change memory and magnetic memory, and NVM transistors, e.g., ferroelectric transistor and NOR Flash memory. ### **APPENDIX** Theorem 1: $E_N: A_t \to B_t$ is a bijective function for $A_t = \left\{0, 1, \dots, \binom{2N}{N} - 1\right\}$ and $B_t = \{b \mid b: 2N\text{-bit binary numbers, each with } N \text{ 1's and } N \text{ 0's}\}.$ *Proof:* Nonnegative integer $a_i$ is defined as $$a_{i+1} = a_i - {c_i \choose N-i}$$ for $0 \le i < N-1$ , (3) and $a_0 = a \ge 0$ ). $c_i$ satisfies the following inequality: $$\binom{c_i}{N-i} \le a_i < \binom{c_i+1}{N-i} \quad \text{for } 0 \le i < N.$$ (4) Define the number of elements in set $$X$$ as $n(X)$ . Lemma 1: $n(A_t) = n(B_t) = \binom{2N}{N}$ . Lemma 2: $c_i > c_{i+1}$ for $0 \le i < N-1$ . *Proof:* For $a_i = 0$ , the only $c_i$ satisfying the inequality in (4) is $c_i = N - i - 1$ that yields $a_{i+1} = 0$ according to (1). From (4), $c_{i+1} = N - i - 2$ . Therefore, $c_i > c_{i+1}$ . For $a_i > 0$ , using (3) and the fact that $a_i < {c_i + 1 \choose N - i}$ in (4), the following inequality is acquired: $$0 \le a_{i+1} < \begin{pmatrix} c_i + 1 \\ N - i \end{pmatrix} - \begin{pmatrix} c_i \\ N - i \end{pmatrix} = \begin{pmatrix} c_i \\ N - (i+1) \end{pmatrix}. \tag{5}$$ Equation (4) for $$a_{i+1}$$ is $\begin{pmatrix} c_{i+1} \\ N - (i+1) \end{pmatrix} \leq a_{i+1} < a_{i+1}$ $\binom{c_{i+1}+1}{N-(i+1)}$ . $a_{i+1}$ should simultaneously satisfy this equation and (3), which is true if $$\binom{c_i}{N-(i+1)}$$ > $$\binom{c_{i+1}}{N-(i+1)}$$ . Therefore, $c_i > c_{i+1}$ . Consequently, $c_i > c_{i+1}$ holds for nonnegative $a_i$ . For given a, vector c(a) is defined as c(a) $[c_0, c_1, \ldots, c_{N-1}]$ where the components are sorted in descending order according to Lemma 2. Associating b with c(a) such that '1' is placed on each $(c_i+1)$ th digit of b, and '0's on the other digits, it is proven that c(a) is bijectively mapped to b. If a is also bijectively mapped to c(a), a is eventually proven to be mapped to b in a bijective manner. Also, using Lemma 1 and bijective mapping of c(a) to b, the following equation is acquired: $$n(\mathbf{A}_t) = n(\mathbf{B}_t) = n(\mathbf{C}) = \begin{pmatrix} 2N \\ N \end{pmatrix}, \tag{6}$$ where $C = \{c \mid c = c(a)\}.$ Lemma 3: if $x \neq y$ , then $c(x) \neq c(y)$ . *Proof:* If Lemma 3 is true, its contraposition (if c(x)) c(y), then x = y is also true. Given (1), $a(= a_0)$ is expressed as $$a = a_{N-1} + \sum_{i=0}^{N-2} {c_i \choose N-i}.$$ (7) Equation (4) for $$i = N$$ -1 yields $\binom{c_{N-1}}{1} \le a_{N-1} < \binom{c_{N-1}+1}{1}$ , i.e., $c_{N-1} \le a_{N-1} < c_{N-1}+1$ . Thus, $a_{N-1} = c_{N-1} = \binom{c_{N-1}}{1}$ . Therefore, (7) is rewritten by $a = \sum_{i=0}^{N-1} \binom{c_i}{N-i}$ . This equation indicates a unique $a$ for a given vector $c(a)$ , so that if $c(x) = c(y)$ , then $x = y$ . Lemma 3 therefore holds true, identifying injective mapping of $a$ to $c$ . Equation (6) and Lemma 3 identify bijective mapping of a to c. Given the bijective mapping of a to c and c to b, the encoding function $E_N$ is a bijective function. #### **REFERENCES** - [1] K. Pagiamtzis and A. Sheikholeslami, "Content-addressable memory (CAM) circuits and architectures: A tutorial and survey," IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 712-727, Mar. 2006. - [2] H. J. Chao, "Next generation routers," Proc. IEEE, vol. 90, no. 9, pp. 1518-1558, Sep. 2002. - T.-B. Pei and C. Zukowski, "VLSI implementation of routing tables: Tries and CAMs," in Proc. INFOCOM, Apr. 1991, pp. 515-524. - T.-B. Pei and C. Zukowski, "Putting routing tables in silicon," *IEEE Netw.*, vol. 6, no. 1, pp. 42-50, Jan. 1992. - N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M. Chen, "Design of multifield IPv6 packet classifiers using ternary CAMs," in Proc. GLOBECOM, vol. 3, Nov. 2001, pp. 1877-1881. - [6] G. Qin, S. Ata, I. Oka, and C. Fujiwara, "Effective bit selection methods for improving performance of packet classifications on IP routers," in Proc. GLOBECOM, vol. 3, Nov. 2002, pp. 2350-2354. - A. J. McAuley and P. Francis, "Fast routing table lookup using CAMs," in Proc. INFOCOM, May 1993, pp. 1382-1391. - V. Kornijcuk, J. Park, G. Kim, D. Kim, I. Kim, J. Kim, J. Y. Kwak, and D. S. Jeong, "Reconfigurable spike routing architectures for on-chip local learning in neuromorphic systems," Adv. Mater. Technol., vol. 4, no. 1, Jan. 2019, Art. no. 1800345. - [9] S. Panchanathan and M. Goldberg, "A content-addressable memory architecture for image coding using vector quantization," IEEE Trans. Signal Process., vol. 39, no. 9, pp. 2066-2078, Sep. 1991. - [10] C. Y. Lee and M. C. Paull, "A content addressable distributed logic memory with applications to information retrieval," Proc. IEEE, vol. 51, no. 6, pp. 924-932, Jun. 1963. - [11] I. Hayashi, T. Amano, N. Watanabe, Y. Yano, Y. Kuroda, M. Shirata, K. Dosaka, K. Nii, H. Noda, and H. Kawai, "A 250-MHz 18-Mb full ternary CAM with low-voltage matchline sensing scheme in 65-nm CMOS," IEEE J. Solid-State Circuits, vol. 48, no. 11, pp. 2671-2680, Nov. 2013. - [12] O. Tyshchenko and A. Sheikholeslami, "Match sensing using match-line stability in content-addressable memories (CAM)," IEEE J. Solid-State Circuits, vol. 43, no. 9, pp. 1972–1981, Sep. 2008. - [13] Y. Yang, J. Mathew, R. S. Chakraborty, M. Ottavi, and D. K. Pradhan, "Low cost memristor associative memory design for full and partial matching applications," IEEE Trans. Nanotechnol., vol. 15, no. 3, pp. 527-538, May 2016. - O. Guo, X. Guo, Y. Bai, and E. Ipek, "A resistive TCAM accelerator for data-intensive computing," in Proc. 44th Annu. IEEE/ACM Int. Symp. Microarchitecture, Dec. 2011, pp. 339-350. - [15] B. Rajendran, R. W. Cheek, L. A. Lastras, M. M. Franceschini, M. J. Breitwisch, A. G. Schrott, J. Li, R. K. Montoye, L. Chang, and C. Lam, "Demonstration of CAM and TCAM using phase change devices," in *Proc. 3rd IEEE Int. Memory Workshop*, May 2011, pp. 1–4. - [16] J. Li, R. K. Montoye, M. Ishii, and L. Chang, "1 Mb 0.41 μm<sup>2</sup> 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 896–907, Apr. 2013. - [17] S. Matsunaga, M. Natsui, K. Hiyama, T. Endoh, H. Ohno, and T. Hanyu, "Fine-grained power-gating scheme of a metal-oxide-semiconductor and magnetic-tunnel-junction-hybrid bit-serial ternary content-addressable memory," *Jpn. J. Appl. Phys.*, vol. 49, no. 4S, 2010, Art. no. 04DM05. - [18] S. Matsunaga, A. Katsumata, M. Natsui, S. Fukami, T. Endoh, H. Ohno, and T. Hanyu, "Fully parallel 6T-2 MTJ nonvolatile TCAM with single-transistor-based self match-line discharge control," in *Symp. VLSI Circuits-Dig. Tech. Papers*, Jun. 2011, pp. 298–299. - [19] W. Xu, T. Zhang, and Y. Chen, "Design of spin-torque transfer magnetoresistive RAM and CAM/TCAM with high sensing and search speed," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 18, no. 1, pp. 66–74, Jan. 2010. - [20] I. Bayram and Y. Chen, "NV-TCAM: Alternative interests and practices in NVM designs," in *Proc. IEEE Non-Volatile Memory Syst. Appl. Symp.*, Aug. 2014, pp. 1–6. - [21] B. Chen, Y. Zhang, W. Liu, S. Xu, R. Cheng, R. Zhang, and Y. Zhao, "Ge-based asymmetric RRAM enable 8F<sup>2</sup> content addressable memory," *IEEE Electron Device Lett.*, vol. 39, no. 9, pp. 1294–1297, Sep. 2018. - [22] L.-Y. Huang, M.-F. Chang, C.-H. Chuang, C.-C. Kuo, C.-F. Chen, G.-H. Yang, H.-J. Tsai, T.-F. Chen, S.-S. Sheu, K.-L. Su, F. T. Chen, T.-K. Ku, M.-J. Tsai, and M.-J. Kao, "ReRAM-based 4T2R non-volatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing," in Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2014, pp. 1–2. - [23] R. Han, W. Shen, P. Huang, Z. Zhou, L. Liu, X. Liu, and J. Kang, "A novel ternary content addressable memory design based on resistive random access memory with high intensity and low search energy," *Jpn. J. Appl. Phys.*, vol. 57, no. 4S, 2018, Art. no. 04FE02. - [24] D. R. B. Ly, B. Giraud, J.-P. Noel, A. Grossi, N. Castellani, G. Sassine, J.-F. Nodin, G. Molas, C. Fenouillet-Beranger, G. Indiveri, E. Nowak, and E. Vianello, "In-depth characterization of resistive memory-based ternary content addressable memories," in *IEDM Tech. Dig.*, Dec. 2018, pp. 20.3.1–20.3.4. - [25] A. Grossi, E. Vianello, C. Zambelli, P. Royer, J. P. Noel, B. Giraud, L. Perniola, P. Olivo, and E. Nowak, "Experimental investigation of 4-kb RRAM arrays programming conditions suitable for TCAM," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 26, no. 12, pp. 2599–2607, Mar. 2018. - [26] A. Chen, "A comprehensive crossbar array model with solutions for line resistance and nonlinear device characteristics," *IEEE Trans. Electron Devices*, vol. 60, no. 4, pp. 1318–1326, Apr. 2013. - [27] P.-Y. Chen and S. Yu, "Compact modeling of RRAM devices and its applications in 1T1R and 1S1R array design," *IEEE Trans. Electron Devices*, vol. 62, no. 12, pp. 4022–4028, Dec. 2015. - [28] I. Arsovski and A. Sheikholeslami, "A mismatch-dependent power allocation technique for match-line sensing in content-addressable memories," IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 1958–1966, Nov. 2003. - [29] N. Mohan, W. Fung, D. Wright, and M. Sachdev, "A low-power ternary CAM with positive-feedback match-line sense amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 3, pp. 566–573, Mar. 2009. VLADIMIR KORNIJCUK received the B.S. degree in telecommunication physics and electronics from Vilnius University, Vilnius, Lithuania, the M.S. degree in materials science and engineering from the Seoul National University of Science and Technology, Seoul, South Korea, and the Ph.D. degree in nano and information technology with the University of Science and Technology, South Korea. He is currently a Post-doctoral Scholar with Hanyang university, South Korea. His current research interests include analog neuromorphic VLSI models of neurons and synapses, and digital spike routing architectures aimed for large-scale mixed neuromorphic systems. JEESON KIM received the B.Eng. degree in semiconductor systems engineering from the Catholic University of Korea, South Korea, in 2011, and the M.S. and Ph.D. degrees from RMIT University, Australia, in 2014 and 2019, respectively. She is currently a Postdoctoral Scholar with Hanyang University, South Korea. Her research interests include nano-enabled hardware security, low-power solutions for cyber-physical systems, and analog and digital designs for neuromorphic systems. **DOHUN KIM** received the B.S. degree in materials science and engineering from Seoul National University, Seoul, South Korea, in 2015, where he is currently pursuing the Ph.D. degree in materials science and engineering. Since 2016, he has been focusing on learning algorithm for neuromorphic hardware implementation, especially for temporal learning. **CHEOL SEONG HWANG** received the Ph.D. degree from Seoul National University, Seoul, South Korea, in 1993, where he has been a Professor with the Department of Materials Science and Engineering, since 1998. His current research interests include high-*k* gate oxides, dynamic random access memory capacitors, new memory devices, including resistive RAM devices and ferroelectric materials and devices, energy storage capacitors, and neuromorphic computing. **GUHYUN KIM** received the B.S. degree in material science and engineering from Seoul National University, Seoul, South Korea, in 2015, where he is currently pursuing the integrated Ph.D. degree with the Dielectric Thin Film Laboratory. His research interests include neuromorphic engineering and crossbar arrays (RRAM). **DOO SEOK JEONG** received the B.E. and M.E. degrees in materials science from Seoul National University, in 2002 and 2005, respectively, and the Ph.D. degree in materials science from RWTH Aachen, Germany, in 2008. He was with the Korea Institute of Science and Technology, from 2008 to 2018. He is currently an Associate Professor with Hanyang University, South Korea. His research interest includes spiking neural networks for temporal learning, ranging from building block to learning algorithm. • • •