## Session 29 Overview: Digital Circuits for Computing, Clocking and Power Management **DIGITAL CIRCUITS SUBCOMMITTEE**







Spapers demonstrate: 1) RRAM-based in-memory computing for deep convolutional neural networks, and 2) a dynamic-precision bit-serial spatial Eaccelerator for solving differential equations. For clocking, three papers demonstrate: 1) a fast-lock wide-range clock generator with asynchronous adaptive-droop mitigation, 2) a fractional-N MDLL with a background DTC calibration, and 3) a fractional output divider with replica-DTC-free background calibration. The last three papers focus on digital power management, demonstrating: 1) a distributed digital low-dropout (DLDO) 😤 voltage regulator with a high power density and a wide load current range in 5nm FinFET technology; 2) a single-inductor 4-output power



29.1 A 40nm 64Kb 56.67T0PS/W Read-Disturb-Tolerant Compute-in-Memory/Digital RRAM Macro with Active-

 A 40nm 64Kb 56.67TOPS/W Read-Disturb-Tolerant Compute-in-Memory/Digital RRAM Macro with Acti Feedback-Based Read and In-Situ Write Verification Jong-Hyeok Yoon, Georgia Institute of Technology and TSMC present a hybrid compute-in-memory/digital, 60 0.437mm<sup>2</sup> RRAM macro in 40nm CMOS with active-feedback-based read and in-situ write verification. The made sign highlights how digital circuit techniques can alleviate technology challenges. The in-situ write verification teduces the resistance variation of an RRAM array to 1/3 and for compute-in-memory applications has an aver (peak) energy efficiency of 4.15 (56.67)TOPS/W at 100MHz.
A 29.2 A 21×21 Dynamic-Precision Bit-Serial Computing Graph Accelerator for Solving Partial Differential Equation Using Finite Difference Method Junjie Mu, Nanyang Technological University, Singapore, Singapore In Paper 29.2, Nanyang Technological University presents a graph accelerator in 65nm CMOS for solving partial equations using the finite difference method (FDM). Energy efficiency is improved using bit-section of the section o In Paper 29.1, Georgia Institute of Technology and TSMC present a hybrid compute-in-memory/digital, 65Kb 0.437mm<sup>2</sup> RRAM macro in 40nm CMOS with active-feedback-based read and in-situ write verification. The macro design highlights how digital circuit techniques can alleviate technology challenges. The in-situ write verification reduces the resistance variation of an RRAM array to 1/3 and for compute-in-memory applications has an average

# 29.2 A 21×21 Dynamic-Precision Bit-Serial Computing Graph Accelerator for Solving Partial Differential Equations

In Paper 29.2, Nanyang Technological University presents a graph accelerator in 65nm CMOS for solving partial differential equations using the finite difference method (FDM). Energy efficiency is improved using bit-serial communication and a residue-based FDM, while performance is improved using a checkerboard update method to maximize parallelism. The graph accelerator integrates 21×21 PEs in 0.462mm<sup>2</sup> and consumes 1.59nJ per iteration at 16b precision, 1V, and 25.6MHz.

7:16 AM

#### 29.3 80ns Fast-Lock 0.4-to-6.5GHz Clock Generator with Self-Referenced Asynchronous Adaptive Droop Mitigation Praveen Mosalikanti, Intel, Portland, OR

In Paper 29.3. Intel presents a 80ns fast-lock 0.4-to-6.5GHz 0.01mm<sup>2</sup> clock generator in 10nm CMOS with selfreferenced, asynchronous adaptive droop mitigation for uninterrupted, overshoot-free clocks for DVFS. When di/dt constraints prevail, measurements show gradual frequency transitions up to 650MHz per 100ns, and when unconstrained, within 80ns with <1% exit frequency error.

### 7:24 AM

## 29.4 A Fractional-N Digital MDLL with Background Two-Point DTC Calibration Achieving -60dBc Fractional Spur

Qiaochu Zhang, University of Southern California, Los Angeles, CA

In Paper 29.4, the University of Southern California presents a 0.18mm<sup>2</sup> fractional-N MDLL in 65nm CMOS with a background DTC calibration. The DTC gain and offset errors are estimated and corrected in the analog and digital domains, respectively. TDC dithering and adaptive comb-filter-assisted dither cancellation are used to further enhance calibration accuracy. Measurement results show >25dB improvement that results in -60dBc fractional spur and 1.67ps RMS jitter.

#### 7:32 AM

29.5 A 0.008mm<sup>2</sup> 1.5mW 0.625-to-200MHz Fractional Output Divider with 120fs<sub>rms</sub> Jitter Based on Replica-DTC-Free Background Calibration

#### Chun-Yu Lin, National Taiwan University, Taipei, Taiwan

In Paper 29.5, National Taiwan University presents a fractional output divider in 90nm CMOS with replica-DTCfree background calibration. This divider demonstrates the frequency range of 0.625 to 200MHz, occupies 0.008mm<sup>2</sup> of area, and consumes 1.5mW of power. The results show instantaneous frequency switching capability. At 192MHz output, spurs are below -65dBc, and the rms jitter is 120fs.

### 7:40 AM

#### 29.6 A Distributed Digital LDO with Time-Multiplexing Calibration Loop Achieving 40A/mm<sup>2</sup> Current Density and 1mA-to-6.4A Ultra-Wide Load Range in 5nm FinFET CMOS

#### Dong-Hoon Jung, Samsung Electronics, Hwaseong, Korea

In Paper 29.6, Samsung Electronics introduces a distributed digital LDO with 16 local LDOs and a global controller in 5nm FinFET CMOS. The proposed time-multiplexing calibration loop achieves a 46% reduction in output mismatch and maintains 20mV droop under a 1A load step with peak current efficiency of 99.89%. The maximum load current is 6.4A in 0.16mm<sup>2</sup>, resulting in a current density of 40A/mm<sup>2</sup>.

#### 7:48 AM

29.7 A Single-Inductor 4-Output SoC with Dynamic Droop Allocation and Adaptive Clocking for Enhanced Performance and Energy Efficiency in 65nm CMOS

### Chi-Hsiang Huang, University of Washington, Seattle, WA

In Paper 29.7, the University of Washington introduces dynamic droop allocation and adaptive clocking for supplyvoltage margin reduction in single-inductor multiple-output (SIMO) regulators. Measurements on a digitally controlled, SIMO-regulated 4-domain SoC in 65nm CMOS demonstrate a 98% average margin reduction over a baseline implementation. The dynamic-droop-allocation technique reduces adaptive clocking cycle loss by 3.5x.

#### 7:56 AM

### 29.8 115nA@3V ULPMark-CP Score 1205 SCVR-Less Dynamic Voltage-Stacking Scheme for IoT MCU

#### Xiaomin Li, Nanjing Low Power IC Technology Institute, Nanjing, China

In Paper 29.8, Nanjing Low Power IC Technology Institute presents a dynamic voltage-stacking scheme to reduce the sleep current of a 2.38mm<sup>2</sup> IoT microcontroller in 40nm CMOS. Measurements demonstrate a 115nA sleep state leakage at 3V with a 32% reduction compared to a conventional flat architecture and achieves a ULPMark-CP score of 1205.









