I. Introduction
Edge-computing devices have limited computing resources, including memory and battery capacity, making energy efficiency crucial. Deep neural networks (DNNs) have shown remarkable progress in many applications, but their heavy calculations make training them on edge devices almost impossible. The inference process requires a large number of weights and activations, resulting in energy inefficiency and difficulty in processing the inference on edge devices. Quantization is a promising solution for reducing bit-precision and saving DRAM footprints and energy consumption. While binary neural networks (BNNs) [5] have been explored, they suffer from a considerable accuracy drop, which poses a challenge. Recent research demonstrates that ternary neural networks (TNNs) [1], [9] can achieve significantly better accuracy than BNNs. TNNs can be implemented on SRAM-based inmemory computing (IMC) circuits, which nearly eliminates the data traffic problem. However, storing one weight of TNNs in two 6T-SRAM cells reduces the efficiency of TNNs on SRAM-based IMC circuits. To address this issue, some ternary SRAM cells have been developed [1], [3], but their large area remains problematic, which motivates this work.