

Received 8 December 2024, accepted 27 December 2024, date of publication 30 December 2024, date of current version 6 January 2025. *Digital Object Identifier 10.1109/ACCESS.2024.3524500* 

# **RESEARCH ARTICLE**

# Analog Computing for Nonlinear Shock Tube PDE Models: Test and Measurement of CMOS Chip

HASANTHA MALAVIPATHIRANA<sup>®1</sup>, (Member, IEEE),

SOUMYAJIT MANDAL<sup>©2</sup>, (Senior Member, IEEE), NILAN UDAYANGA<sup>1</sup>, (Member, IEEE), YINGYING WANG<sup>3</sup>, (Member, IEEE), S. I. HARIHARAN<sup>©3</sup>, (Life Senior Member, IEEE), AND ARJUNA MADANAYAKE<sup>©1</sup>, (Member, IEEE)

<sup>1</sup>Department of Electrical and Computer Engineering, Florida International University, Miami, FL 33174, USA
<sup>2</sup>Instrumentation Division, Brookhaven National Laboratory, Upton, NY 11973, USA
<sup>3</sup>Ocius Technologies, Akron, OH 44311, USA

Corresponding author: Hasantha Malavipathirana (hmala005@fiu.edu)

This work was supported by the Defense Advanced Research Projects Agency through a Subaward from Ocius Technologies under Grant DARPA: D17PC00289.

**ABSTRACT** Long left ignored by the digital computing industry since its heyday in 1940's, analog computing is today making a comeback as Moore's Law slows down. Analog CMOS has power efficiency advantages over digital CMOS for low-precision applications in edge computing, scientific computing, and artificial intelligence/machine learning (AI/ML) verticals. Driven by observed non-trivial improvements in performance over digital processors while solving linear partial differential equations (PDEs), this paper presents experimental results and analysis from a single-chip CMOS analog computer for solving nonlinear PDEs. The chip integrates a 15-point fully-parallel spatially-discrete time-continuous (SDTC) finite difference time-domain (FDTD) solver for acoustic shock wave equations with radiation boundary conditions. The design was realized in TSMC 180 nm CMOS technology. It has an active area of 7.38 mm×4.64 mm and consumes 936 mW while delivering an equivalent FDTD temporal update rate of 80 MHz and an analog bandwidth of 2 MHz. The paper discusses the challenges and associated design trade-offs in realizing such high-performance CMOS analog computers, including sensitivity to process, voltage, and temperature (PVT) variations, sensitivity to bias and voltage regulation, errors associated with noise, difficulties with calibration; it also outlines possible approaches for mitigating these challenges.

**INDEX TERMS** Analog computing, finite-difference time domain (FDTD), acceleration, CMOS, nonlinear, partial differential equations (PDEs).

### I. INTRODUCTION

Analog computing is gaining renewed attention due to its potential for high efficiency in low- to moderate-precision tasks, outperforming digital computing in speed and power consumption, especially for applications requiring real-time processing [1], [2], [3]. As digital computers approach physical and power-efficiency limits, analog solutions become increasingly appealing, particularly for applications like AI/ML and scientific computation at the edge, where

The associate editor coordinating the review of this manuscript and approving it for publication was Yuh-Shyan Hwang<sup>(b)</sup>.

low-latency and power efficiency are critical [4], [5], [6], [7], [8].

Modern integrated circuit (IC) realizations of analog computers finds a plethora of applications in the AI community, particularly in deep learning (DL) [9], [10], [11] and spiking neural networks [4], [5]. Analog implementations for compute-in-memory (CIM) systems and neuromorphic designs using memristors and ReRAM have demonstrated power efficiency in AI applications like convolutional neural networks (CNNs) [6], [7], [8], [12], [13], [14], underscoring the potential of analog systems in edge computing. The energy efficiency benefits of analog/neuromorphic computing ICs render them an attractive option for edge AI and potentially for generative AI applications as well [15], [16], [17].

Analog computer ICs have also been proposed for solving scientific computing problems involving linear and nonlinear partial difference equations (PDEs). Unlike digital systems, which rely on sequential operations, analog systems process information in a continuous-time domain, enabling parallel computation with minimal latency. This continuous processing approach offers substantial advantages for solving differential equations, which are foundational in modeling various physical phenomena. Special-purpose analog ICs have been shown to efficiently solve both differential and integral equations, achieving computational performance metrics competitive with or superior to those of digital architectures [3], [18], [19], [20], [21], [22], [23], [24].

Our work builds on this trend by presenting a CMOS-based analog computer specifically for solving nonlinear partial differential equations (PDEs). Nonlinear PDEs are vital in areas such as fluid dynamics, electromagnetics, and plasmonics. This analog IC employs a spatially discrete, time-continuous (SDTC) approach, where each spatial point in the PDE is processed by dedicated modules, allowing for real-time updates across a spatial grid. Fig. 1 depicts a comparison of analog computers that includes both partial differential equation/ ordinary differential equation (ODE) solvers [2], [3], [20], [22], [24], [25], [26] as well as analog AI/ compute-in-memory (CIM) accelerators [7], [8], [17], [27], [28], contextualizing our work within the broader field of analog computing, particularly in terms of computation frequency and implementation complexity.

In our previous work [29], we introduced the theoretical framework for this analog PDE solver. Here we present post-layout circuit simulations and measurement results from the chip that implements building blocks of the proposed analog computer. We include IC implementation, test setup, measurement, verification, calibration, debugging, and evaluation of the test results, and also highlight the challenges associated with high-performance analog computing IC design for fast PDE solvers.

The remainder of this paper is organized as follows. Analog computing algorithms for solving nonlinear PDEs are discussed in Section II. Circuit simulations of the algorithms are presented and analyzed in Section III. Section IV describes the implementation of an analog IC to realize the algorithms, while Section V presents the test setup and measurement results from the chip. Finally, Section VI discusses our contributions and concludes the paper.

### **II. ANALOG NONLINEAR PDE ALGORITHMS**

The system of interest is modeled using a nonlinear PDE, specifically the acoustic shock tube problem, which has applications in areas like jet engine inlet design. We employ a spatially-discrete, time-continuous (SDTC) approach based on finite difference time domain (FDTD)



FIGURE 1. Comparison of analog computers: PDE/ODE solvers, analog AI/CIM accelerators based on implementation complexity and computation frequency. Implementation complexity of PDE/ODE solvers are classified based on configurability of the solver (capability to solve single vs multiple problems) and the dimensionality of the problem being solved. Complexity of analog AI accelerators are classified based on neural network configurability and compute macro size.

algorithms. These algorithms are derived and implemented using continuous-time delay elements, which can be approximated using all-pass filters [29], [30], [31]. For details on the derivation of the SDTC algorithm and its mapping to analog circuits, please refer to [29]. Our focus in this paper is on the IC implementation of key building blocks for solving nonlinear PDEs like the shock tube problem. This architecture is programmable and can be adapted to solve other linear and nonlinear conservation-law-based PDEs [29].

The following subsections describe the architecture of a fully-parallel analog CMOS accelerator for solving the acoustic shock wave tube PDEs by using MacCormack's method in continuous-time [29], [32], [33].

### A. ANALOG COMPUTING PLATFORM

Fig. 2(a) presents an overview of the proposed analog computing platform. The analog computing core (CMOS chip) is mounted on a board with supporting circuitry. Inside the chip, all computations are performed in continuous-time, and the core is interfaced to input and output signals via impedance-matched transmission lines. The outputs consist of time-varying voltages that are defined by the PDE, initial conditions, and boundary conditions. A time varying input excitation signal, which defines the source (left) boundary condition of the PDE, is created using a signal generator. A field-programmable gate array (FPGA) can also be used for this purpose. The output waveforms computed by the analog solver are digitized and post-processed by a FPGA. In this implementation, we utilize ADCs integrated within Xilinx RF system-on-chip (RFSoC) devices [34] for digitizing analog outputs. In addition, the analog solver chip is programmed and calibrated using a microcontroller (MCU). The chip is calibrated to improve its accuracy; this process involves



**FIGURE 2.** (a) Overview of the proposed analog computing platform. (b) The analog computing architecture that solves nonlinear acoustic shock wave equations. Boundary modules (BMs) and internal modules (IMs) are connected in a systolic array architecture, with each module producing a time-varying voltage that corresponds to the solution of the PDE at a particular spatial location (as indicated on the discrete x axis). Block diagram of (c) the  $f(\Phi)$  circuit, and (d) the APF circuit. This figure is an illustration depicting the connection between Figures 2,3, 5, and 6 from [29].

comparing its output signals with a reference FDTD-based PDE solver implemented in MATLAB and then programming the chip to minimize the error between the two.

#### **B. ANALOG SOLVER ARCHITECTURE**

Fig. 2(b) represents the signal flow graph (SFG) of our analog computing architecture described in [29]. Fig. 2(b) maps SDTC equations into an array of identical analog circuits known as internal modules (IMs) [29]. The internal module IM<sub>i</sub> computes the continuous-time solution of the nonlinear PDE at  $x = i\Delta x$ . The continuous-time delay operator  $\tau$  of the numerical method is implemented as the total time delay within each IM, given by  $\tau = 2\tau_1 + \tau_2 + \tau_3 + \tau_4$ . Here,  $\tau_1 - \tau_4$  are realized using all pass filters (APFs) with a Laplace-domain transfer function  $\phi(s) \approx e^{-s\tau}$  [20], [30]. The APFs at the inputs of each subsystem of the parallel analog computer are also used to compensate for propagation delays in different signal paths. In addition, specialized boundary modules (BMs) are used to implement the boundary conditions.

The input vector  $\alpha_i$  contains all parameters needed to calculate the coefficients of  $\mathbf{f}(\Phi)$  at  $x = i\Delta x$ . Flux operators  $\mathbf{f}_{i-1}^{i-\tau}$  and  $f_{i+1}^p$  are fed by the neighboring IMs at  $x = (i - 1)\Delta x$  and  $x = (i + 1)\Delta x$ , respectively. Similarly, output flux operators  $\mathbf{f}_i^{i-\tau}$  and  $f_i^p$  feed the neighboring modules to produce their outputs. Fig. 2(c) shows the SFG for evaluating an expression of the form  $f(u, \rho) = \alpha_{1,0} u + \alpha_{2,0} u^2 + \alpha_{0,1} \rho + \alpha_{0,2} \rho^2 + \alpha_{1,1} u\rho$ . Voltage mode circuits based on summing and scaling op-amps are used to realize this system, thus computing  $\mathbf{f}(\Phi)$ . Here, the coefficients  $\alpha_{1,0} - \alpha_{1,1}$  map to mean flow coefficients  $u_s, \rho_s$ , and  $c_s$  from the acoustic shock tube PDE given in [29] and [32]. They are tuned using binary-weighted capacitor arrays. Fig. 2(d) shows the block diagram of the 3-stage APF that is used to realize continuous-time delay terms. Further analysis of

the proposed analog computing architecture, including circuit design details, is provided in [29].

## **III. CIRCUIT SIMULATIONS AND ANALYSIS**

The proposed IC for modeling the acoustic shock wave tube problem was designed and fabricated in the TSMC 180 nm CMOS process. Based on the parasitic extracted post-layout simulation results of the chip, this section analyses the impact of various circuit nonidealities such as circuit parasitics, PVT variations, and device mismatches on the proposed analog solver, in sub-sections B, D, and E, respectively. The performance of the proposed analog solver IC, which is based on the all-pass delay approximation (APDA) architecture is dependant on the performance of the allpass filter, and also is susceptible to supply and bias voltage variations across the chip [19]. Therefore, subsections C and F discuss the impact of APF performance and the sensitivity of the chip to voltage variations, respectively.

### A. TRANSISTOR-LEVEL SIMULATIONS

Initially, the system was designed for 33 modules (31 IMs and 2 BMs), and pre-layout simulations were performed to characterise the analog solver in terms of accuracy, input range, dynamic range, and noise sensitivity, as presented in [29]. In this work, to test a simplified building block of the proposed analog solver, only 15 modules (13 IMs and 2 BMs) were included in the fabricated chip. Multiple chips could be cascaded to improve the spatial accuracy for a given PDE. The continuous-time delay term  $\tau$  of the analog solver is chosen to represent the total time delay in the signal flow path of an IM [29], and is set to  $\tau = 12.5$  ns. Parasitic-extracted post-layout simulations were performed to verify the functionality of individual circuit blocks (op-amp, APF, multiplier, and single IM), as well as the complete solver.



**FIGURE 3.** Space-time variation and the corresponding MSD and  $\gamma_i$  variation of acoustic density from post-layout simulations (after calibration) at (a) typical-typical, (b) slow-slow, and (c) fast-fast process corners.

# 1) QUANTIFYING ACCURACY VIA MEAN SQUARED DIFFERENCE

The accuracy of the analog solver was quantified by comparing post-layout simulation results with the standard FDTD solution using two metrics: 1) the mean squared difference (MSD), and 2) the normalized MSD,  $\gamma$  (in dB). The reference FDTD solution was obtained from a MATLAB implementation of the two-step MacCormack's method using double-precision arithmetic. The discrete-time step-size of the FDTD solution was chosen to match the continuous-time delay  $\tau$  of the analog solver. Since the value of MSD varies with the spatial location, MSD<sub>*i*</sub> was computed for each location, and is defined as

$$MSD_{i} = \frac{1}{N_{t}} \sum_{n=0}^{N_{t}-1} \left[ U_{F}(i, n\Delta T) - U_{A}(i, n\Delta T) \right]^{2}, \quad (1)$$

where  $U_F(i, n\Delta T)$  and  $U_A(i, n\Delta T)$  are the solutions of the FDTD and analog solvers, respectively, at spatial position *i* and time  $t = n\Delta T$ . Also,  $N_t$  is the total number of time samples and  $\Delta T$  is the temporal step size of the FDTD simulation. Similarly, the normalized MSD  $\gamma_i$  is expressed as

$$\gamma_{i} = 10 \log_{10} \frac{\sum_{n=0}^{N_{t}-1} \left[ U_{F} \left( i, n\Delta T \right) - U_{A} \left( i, n\Delta T \right) \right]^{2}}{\sum_{n=0}^{N_{t}-1} U_{F} \left( i, n\Delta T \right)^{2}}.$$
 (2)

The average normalized MSD  $\gamma_{avg}$  over the complete spatial grid can now be expressed as  $\gamma_{avg} = \frac{1}{N_x} \sum_{i=0}^{N_x-1} \gamma_i$ , where  $N_x$  is the number of spatial grid positions.

# 2) TYPICAL PROCESS CORNER POST-LAYOUT SIMULATIONS

Simulations of the parasitic-extracted netlist (assuming "typical" process corner device parameters and no mismatch) were performed to quantify the accuracy of the analog solver. Figs. 3(a) shows the space-time acoustic density variation and the corresponding MSD and  $\gamma_i$  variation obtained from the simulations when the left boundary is excited with a 2 MHz 150 mV<sub>pp</sub> sinusoid (which prescribes the left boundary value of acoustic density). These simulations agree with the MATLAB model to reasonable accuracy: the average normalized MSD of  $\gamma_{avg} = -11.5$  dB is adequate for this problem.

### **B. SENSITIVITY TO CIRCUIT PARASITICS**

Circuit parasitics (parasitic capacitance and resistance) play a significant role in determining the actual accuracy of the analog solver. Post-layout simulations reveal that key solver parameters, including 1) the tunable group delay and gain of the APF, and 2) the tunable gain of the op-amp, deviate from their pre-layout values by significant amounts (> 50%) [29]. The APF has the highest variability of all the circuit blocks, with 60% gain variation and 40% group delay variation between pre- and post-layout simulations. The post-layout range of tunable APF gain is -3 dB to 3 dB (Fig. 4(a)), which is significantly different from the pre-layout range of -1.5 dB to 1.4 dB [29]. Similarly, the post-layout range of APF group delay is 2.3 ns to 4.5 ns, which is significantly larger than the pre-layout range of 1.5 ns to 2.5 ns. The main source of these deviations is the parasitic capacitance of the capacitor DACs that are used to control gains and delays (Fig. 2(c)). Fortunately, this variability can be compensated by calibrating the programmable gains and delays, thus achieving acceptable accuracy across process corners as shown in Fig. 3.

# C. IMPACT OF APF PERFORMANCE ON ANALOG SOLVER STABILITY

APFs are utilized to compensate for propagation delays of the circuit blocks (e.g., op-amps and multipliers) as well as to set the continuous-time delay of the analog solver. Thus, they play a major role in determining the behaviour of the solver.

### 1) SENSITIVITY TO APF GAIN AND GROUP DELAY

Each APF uses a 4-bit binary weighted capacitor array to tune the group delay with a nominal resolution of 0.13 ns, while another 4-bit capacitor array is used to tune the gain with a nominal resolution of 0.35 dB. Based on Cadence Spectre simulations, single-bit changes in APF gain and delay cause changes in final solver accuracy (in terms of  $\gamma_{avg}$ ) of 17.4% and 5%, respectively. Such APF errors add up exponentially across the chip, degrading the overall accuracy. Thus, to achieve only 0.5 dB change in  $\gamma_{avg}$  (<5% variation per single-bit), the required resolution in APF gain and delay are 0.25 dB and 0.12 ns, respectively.

### 2) STABILITY OF THE ANALOG SOLVER

There are 12 APFs in each module, and ideally the gain of each APF should be unity (0 dB). However, APF gain deviates from unity due to layout parasitics, device mismatches, and supply voltage variations. To study the impact of APF gain



FIGURE 4. (a)-(b) Simulated (a) gain and (b) group delay variation of the APF for different capacitor DAC configurations over a bandwidth of 20 MHz. (c)-(d) Simulated acoustic density from IM3 to IM6 (c) at medium and (d) low APF gain configurations.

variations on the analog solver, we performed simulations at different APF gain settings. Fig. 4(c) shows the output (acoustic density) at IM3 - IM6 when APF gains are set to within 0.1 dB from unity (referred to as the medium APF gain configuration). The outputs are delayed and increase in amplitude when going from IM3 to IM6, but decrease in amplitude after IM7 (not shown in the figure). Also, the shapes of these waveforms are not sinusoidal due to the nonlinear nature of the PDE. Outputs from all the modules (IM1 - IM15) in this configuration were plotted in 2D spacetime to obtain the graph in Fig. 3(a).

When the APF gains are set to a lower value, i.e., the gains of at least 2 APFs in each module are reduced by 0.35 dB (minimum gain resolution), the output amplitudes decrease from left to right (IM3 to IM6) as shown in Fig. 4(d). This is because low-gain APFs act as attenuators for the computation. On the other hand, when the gains of at least 2 APFs in each module are increased by 0.35 dB, the output amplitudes continue to increase until the op-amps saturate, making the system unstable. Therefore, precise calibration of APF gain is critical for the stability and accuracy of the analog solver. Furthermore, the APF gains can be fine-tuned by improving the resolution of the gain-control capacitor DACs.

### D. SENSITIVITY TO PVT VARIATIONS

Accuracy of the analog solver is also dependent on variations in 1) the IC fabrication process, 2) power supply and bias voltages, and 3) die temperature.

## 1) PROCESS CORNER ANALYSIS OF THE ANALOG SOLVER

The impact of process corner variations was analyzed by performing simulations at typical-typical (tt), slow-slow (ss) and fast-fast (ff) process corners and then computing the corresponding MSD and  $\gamma$  metrics. At the "tt" corner,



**FIGURE 5.** (a)-(b) APF (a) group delay and (b) gain variation from the simulations with device mismatch models at different process corners. (c)-(d) Gain and group delay variation of APF with (c) supply voltage  $(V_{dd})$ , and (d) die temperature.

 $\gamma_{avg}$  for acoustic density was -11.5 dB, but this increased to 9.5 dB and -3.5 dB at the "ss", and "ff" corners, respectively. Fortunately, higher accuracy can be achieved by tuning the APF gains and delays. Figs. 3(b) and (c) show the calibrated space-time acoustic density, MSD, and  $\gamma_i$  at the "ss" and "ff" corners, respectively. APF calibration improves  $\gamma_{avg}$  at these process corners to -11.9 dB and -11.1 dB, respectively; these values are similar to those obtained in the nominal ("tt") case.

#### 2) STUDY OF PVT ON APF GAIN AND GROUP DELAY

Since the analog solver is highly sensitive to APF parameters, these parameters were analyzed as a function of PVT variations. Multiple simulations were conducted at each process corner ("tt", "ss", "ff", and "sf") to quantify APF gain and delay variations. As shown in Fig. 5(a), APF group delay varied from 4.5 ns in the slow corner to 1.6 ns in the fast corner, while APF gain (Fig. 5(b)) varied from -0.15 dB to 0.15 dB. As shown in Fig. 5(c), supply voltage variations from 1.74 V to 1.81 V resulted in significant APF group delay variations (from 3.4 ns to 2.3 ns) while the gain remained almost invariant. The dependence of APF group delay on supply voltage can be attributed to the supply voltage-dependant propagation delay (PD) of the output op-amp and the voltage buffers (NMOS-PMOS source follower pair) utilized in the APF circuit (Fig. 2(e)). For an example, when VDD is changed from 1.8 V to 1.74 V, the simulated PD of the op-amp buffer increases by 100 ps, while that of the source follower buffer increases by 200 ps. Furthermore, the finite PD of the transmission gates used in the capacitor DACs also affects the overall group delay of the APF. Finally, both APF gain and group delay remained quasi-constant versus temperature, as shown in Fig. 5(d). Note that the gain variation is only 0.04 dB and the delay variation is 120 ps, making the APF nearly invariant to temperature fluctuations.

# IEEE Access



**FIGURE 6.** (a)-(b) Average normalized MSD variation (a) with device mismatches, and (b) with supply voltage. (c)-(d) Acoustic density from post layout simulations showing (c) space-time variation, and (d) MSD and normalized MSD variation along the spatial grid. (e)-(f) Simulated variation of  $\gamma_{avg}$  with (e)  $V_{bias1}$  and (f)  $V_{bias4}$ .



**FIGURE 7.** Schematic of the fully differential operational amplifier with (a) two amplifying stages and (b) their common-mode feedback circuits.

### E. SENSITIVITY TO DEVICE MISMATCHES

To study the impact of device mismatches on accuracy, Monte Carlo simulations with device mismatch models were conducted using Cadence Spectre for the complete analog solver chip and the simulated acoustic density from each run was analyzed to compute  $\gamma_{avg}$  as shown in Fig. 6(a). The simulations show that device mismatch has a relatively small effect on accuracy and thus can be ignored for this design.



FIGURE 8. (a) Die photograph of the fabricated nonlinear PDE solver chip. (b) Measured characteristics of the chip.

## F. SENSITIVITY TO SUPPLY AND BIAS VOLTAGES

To further analyze the impact of supply voltage variations on the analog solver accuracy,  $\gamma_{avg}$  was computed at different supply voltages. As shown in Fig. 6(b),  $\gamma_{avg}$  varies over a wide range (-1 dB to -15 dB) for relatively small changes in the supply voltage (from 1.74 V to 1.81 V). Note that layout resistances were not included in these simulations, so no intra-die supply voltage variations were modeled.

### 1) EFFECT OF SUPPLY VOLTAGE ON ANALOG SOLVER

In practical situations, significant intra-die supply voltage variations can occur due to IR drops within the supply voltage network. These variations affect the APF group delay (Section III-D2). This in turn, degrades the accuracy of the analog solver, as the circuit PDs are not being properly compensated [29]. To capture the impact of intra-die supply voltage variations, RC-extracted post-layout simulations were performed using Cadence Spectre. Fig. 6(c) and (d) show the corresponding acoustic density, MSD, and  $\gamma_i$  variations. Normalized MSD is significantly degraded (varies over the range -8 dB to +2 dB, with  $\gamma_{avg} = -3.5$  dB). Fortunately, accuracy can be partially recovered ( $\gamma_{avg} = -5.5$  dB) via APF calibration. Note that the calibrated results are not shown in the figure.

### 2) EFFECT OF OP-AMPS AND APF BIASES

External bias voltages (for the op-amp and APF) also affect the accuracy of the analog solver. There are eight bias voltages common to each module. Of these, 5 voltages (V<sub>bias1</sub>, V<sub>bias2</sub>, V<sub>bias3</sub>, V<sub>bias4</sub>, and V<sub>cm</sub>) are applied to the op-amp, as shown in Fig. 7. From simulations after perturbing each voltage, it was observed that two of the op-amp bias voltages ( $V_{bias1}$  and  $V_{bias4}$ ) have the greatest effect on solver accuracy. These voltages control the tail current of the differential pair M1-M2. Changes in their values cause variations in the op-amp closed-loop gain and the PD (simulated with the op-amp connected in follower configuration), resulting in overall gain and delay variations in the analog solver. As shown in Figs. 6(e) and (f), variation of  $V_{bias1}$  and  $V_{bias4}$  introduces a maximum  $\gamma_{avg}$ variation of 6% and 42%, respectively. Therefore, stable bias voltages are needed for generating repeatable and accurate outputs.

## **IV. IMPLEMENTATION AND TESTING OF THE CHIP**

The analog computer for solving the acoustic shock tube PDE consists of thirteen IMs and two BMs (left and right), which were laid out in three rows. The chip contains a total of 120 summing op-amps (post-layout gain-bandwidth product  $\approx 600$  MHz), 180 APFs, and 90 multipliers. Each IM occupies an area of 1.33 mm×1.27 mm, including two op-amp buffers that enable off-chip readout of the output voltages. Digital configuration bits are accessed using a serial peripheral interface (SPI) bus, and power was routed using the top metal layer. The chip features a total of 314 pads, including power/ground and I/O pads for the fully-differential analog outputs (acoustic density and velocity). A die photograph of the fabricated chip (die area of 7.38 mm×4.646 mm) is shown in Fig. 8.

# A. MOTHERBOARD AND TEST SETUP

The chip was evaluated using a custom motherboard and a MATLAB-based software interface. The experimental setup consisted of the motherboard and additional supporting instrumentation, as shown in Fig. 9. The evaluation board consists of 30 off-chip buffers and 70 I/O connections in total, which includes 30 output differential pairs of the analog solver and additional I/O to debug the chip. There are 3 DC power connections which supply power to the analog chip (1.8 V), off-chip buffers (5 V), and set the off-chip buffer output common mode voltage (1.2 V). These voltages can either be supplied externally or generated via on-board voltage regulators from the main supply connection. Additionally there, are 12 DC power/signal connections to supply external bias voltages to the chip and to select between chip mode of operation (normal operating mode or debug mode, where a single IM is tested). Finally, there are 3 connections to control the on-chip SPI block, which allows chip calibration and configuration.

# 1) INTERFACING INPUT AND OUTPUT ANALOG WAVEFORMS

The left-boundary input waveform for the PDE was supplied from an external function generator. RF baluns were used to generate the required differential input signals, while bias tees provided the input common mode voltage. Analog output waveforms from the chip were ac-coupled to off-chip buffers mounted on the evaluation board; these buffers amplify the analog signals to the ADC full-scale prior to digitization using a Xilinx RFSoC platform. Although not part of the analog solver's intrinsic power consumption, we note that the external data converters (ADCs and amplifiers/buffers) consume an additional 10.5 W [7.5 W (15 × RF-ADCs on RFSoC)+3 W (30× AD8351ARMZ amplifiers)] [35]. Under nominal operating conditions, the FPGA consumes another 30 W of power [34].

# 2) PROGRAMMING AND CALIBRATION INTERFACE TO THE ANALOG COMPUTING CHIP

Each IM and BM has 91 tuning bits (realized using capacitor arrays) that can be set using a three-wire SPI bus. An Arduino

2868

microcontroller unit (MCU) was utilized to reconfigure and calibrate the chip over SPI. Programmable parameters include spatially-varying mean flow coefficients of the PDE, APF gains, and APF group delays. Bias voltages for the op-amps were supplied by an eight-channel 16-bit DAC controlled by the same MCU via an I<sup>2</sup>C interface. The 16-bit precision of the DAC allowed fine-grain optimization of the bias points during setup and calibration. During this process, the value of  $\gamma_{avg}$  (the average normalized MSD between the measured results and the MATLAB FDTD model) was minimized by varying both the tuning bits and the bias points.

# B. DIGITAL AND SOFTWARE BACK-END

# 1) FPGA DIGITAL DESIGN

Output analog waveforms at each module (corresponding to particular spatial grid positions) were digitized at 125 MS/s using a 16-channel 12-bit ADC implemented on the Xilinx ZCU-1275 RF-SoC development platform. The 12-bit resolution of the ADC allows digitization of small amplitude (<5 mV) output waveforms, e.g., those generated close to the right boundary module.

As shown in Fig. 10(a), the RF Data Converter IP core supplied by Xilinx [35] was used to activate the 16 ADCs on the RF-SoC platform, which are arranged in 4 tiles with 4 ADCs per tile. The ADC clock was supplied by an on-board phase-locked loop (PLL) running at 200 MHz, and a decimation filter within the RF data converter IP was used to set the desired sampling rate (125 MS/s). An integrated logic analyzer (ILA) debug core provided by Xilinx [36] was used to capture and save the ADC outputs.

# 2) CALIBRATION SETUP

MATLAB scripts were utilized to automate data capture, MCU programming, and MSD computations. A flow diagram of the data capture and chip programming process is depicted in Fig. 10(b). In the first step, the required bias voltages were programmed from MATLAB by using the external 16-bit DAC. Next, the calibration configurations (APF gains, delays and PDE coefficients) were programmed over SPI and the resulting chip outputs captured using the RFSoC. A Python script (called from MATLAB) was used to automate ADC data capture. Captured data from each module was compared with the outputs of the MATLAB FDTD model to compute  $\gamma_{avg}$ . The entire process was then repeated to minimize  $\gamma_{avg}$ .

# C. CALIBRATION AND OPTIMIZATION

The procedure described above was used to calibrate the variable gains, delays, and bias voltages of the analog solver chip prior to normal use. The overall accuracy of the analog solver ( $\gamma_{avg}$ ) is chosen as the objective function for calibration, since the accuracy of the solution depends on the interconnected output of all individual blocks. In this approach, the problem being solved is used as the test vector.

# **IEEE**Access



FIGURE 9. (a) Evaluation board (consisting of 70 I/O connections and 15 DC power/signal lines) with analog chip and (b) the test setup used to evaluate the functionality of the nonlinear PDE solver. An MCU was used to program the analog computer through its SPI bus. Computational outputs at each spatial point are digitized using a 16-channel 12-bit ADC (sampled at 125 MS/s) implemented within a Xilinx ZCU-1275 RF-SoC [34].



**FIGURE 10.** (a) Digital design of the RFSoC's 16-channel ADC. (b) Block diagram of the calibration and measurement setup used to evaluate the functionality of the nonlinear PDE solver chip.

Once calibrated, the state of calibration can be used in the solution of the same PDE with different coefficients.

Calibration bits of the analog solver consists of; 24 4-bit capacitor arrays for controlling APF gain and group delay in each IM and BM, 4 4-bit capacitor arrays to compensate for multiplier gain variations, and 35 capacitor arrays (15 7-bit and 20 4-bit) to set the mean flow coefficients required to solve the PDE. The modules also share 8 global bias voltages, which includes bias voltages of the op-amp, APF and the multiplier [29].

### 1) CALIBRATION PROCEDURE

Initially, all the calibration bits and bias voltages are programmed to their nominal values obtained from post-layout simulations with typical ("tt") device models. Next, the measured value of  $\gamma_{avg}$  is defined as the objective function of the calibration process. The goal of calibration is to minimize this function. Note that only the acoustic density outputs of the first 8 modules of the chip are considered in the computation of  $\gamma_{avg}$ , since the outputs beyond module 8 were too small (below the noise floor of the measurement setup) for an accurate comparison. APF gains and group delays have the largest impact on the objective function, as described earlier.

*Step 1:* Optimum values for APF gain and group delay are computed using an exhaustive search algorithm, implemented using a software script to automate data capture. All other variables are kept fixed at their initial values.

Step 2: Bias voltages are optimized using the simultaneous perturbation stochastic approximation (SPSA) algorithm [20], [37], [38] while keeping the APF gains and delays constant. The SPSA algorithm perturbs the optimization parameters (i.e., the bias voltages) around their initial values and then approximates the gradient of the objective function ( $\gamma_{avg}$ ) using only two measurements; this minimizes the effects of measurement noise and also the required number of iterations.

*Step 3:* The gains of the summing op-amps within the IMs (which set mean flow coefficients of the PDE) are calibrated by applying a scaling factor. This step compensates for the parasitic capacitance of the gain-tuning capacitor arrays.

### 2) ANALYSIS OF THE CALIBRATION RESULTS

We note that *step 1* alone does not find an operating point (APF gain and delay configuration) that provides  $\gamma_{avg} < 0$  dB. Bias voltage optimization (*Step 2*) is thus necessary to further reduce  $\gamma_{avg}$ . As shown in Fig. 11(a), itStep 2 reduces  $\gamma_{avg}$  from 1.2 dB to 0.5 dB within 40 iterations. The values of  $V_{bias1}$  and  $V_{bias4}$  at each iteration are shown in Figs. 11(b) and (c), respectively. The impact of these two bias voltages on analog solver accuracy is significantly higher than the other bias voltages, as predicted from simulations (Section III-F2).



**FIGURE 11.** (a) Variation of the objective function ( $\gamma_{avg}$ ) with each iteration of the SPSA algorithm (regression line drawn in blue). Variation of (b) V<sub>bias1</sub> and (c) V<sub>bias4</sub> with each iteration.

TABLE 1. Measured specifications of the primary circuit blocks.

| Circuit block | Parameter            | Value            |  |
|---------------|----------------------|------------------|--|
| On Amn        | Gain                 | -0.25 dB         |  |
| Op-Amp        | Propagation<br>delay | 0.5 ns           |  |
| APF           | Gain range           | -3.6 dB – 3.5 dB |  |
|               | Gain resolution      | 0.4 dB           |  |
|               | Delay range          | 2.8 ns – 5 ns    |  |
|               | Delay resolution     | 0.2 ns           |  |
| Multiplier    | Single input         | -0.3 dB          |  |
|               | ac gain              |                  |  |

### **V. CHIP MEASUREMENTS**

The analog solver consumes 520 mA current from a 1.8 V supply at its optimum operating point, with each identical module consuming approximately 34.6 mA. Total current consumption of the chip varies from 450 mA - 570 mA over a supply voltage range of 1.74 V - 1.85 V. Prior to calibration, primary building blocks of the chip (op-amp, APF, and multiplier) were tested and measured individually using a Tektronix MSO64 oscilloscope to verify their functionality. Measurements were performed by importing the data captured by the oscilloscope into MATLAB and then processing it to extract parameters such as gain and propagation delay. Measured critical parameters of the primary circuit blocks at an operating frequency of 2 MHz are summarized in Table 1. Multiple measurements were obtained under different configurations to study the behaviour of the calibrated chip. The measurements show that the IMs work correctly; however, the full analog solver did not yield spatio-temporal computations with high enough accuracy (measured  $\gamma_{avg} = 0.5$  dB). This is attributed to both circuit design issues and human errors during design and layout, as explained next.

### A. LAYOUT ERRORS

In the power supply net to one of the modules, sufficient vias were not placed. This caused several modules beyond the 6th spatial point to be sub-optimally biased due to IR drops, resulting in a significant loss of accuracy for some parts of the computation.

We fixed the missing via by using focused ion beam (FIB) chip editing [39], [40]. A tungsten via array with a resistance of  $\approx 1 \Omega$  was deposited using FIB. However, the IR drop



FIGURE 12. Measured acoustic density variation from IM3 to IM6 (a) at medium and (b) low APF gain configuration.

across the deposited via ( $\approx 35$  mV) is still too large; IM6 operates at a lower supply voltage than the other modules, degrading the overall accuracy. Nevertheless, the edited chip could be tested over most of the desired spatial range.

### B. IMPACT OF APF GAIN

Transistor-level simulations (Section III-C) predict that APF gains have a significant effect on analog solver accuracy. To further study the impact of APF gain on the fabricated IC, we measured chip outputs at different APF gain settings.

### 1) MEDIUM APF GAIN CONFIGURATION

Fig. 12(a) shows the acoustic density waveforms at the outputs of IM3 - IM6 when the APFs are at their medium gain configuration and the left boundary is excited with a 150 mV 2 MHz sinusoidal signal (which defines the boundary value of acoustic density). Note that the output amplitudes increase from IM3 - IM5, while the IM6 output is attenuated. The measured output amplitude of IM6 is smaller than the simulated output at similar APF gain settings (Fig. 4(c)). Other IM outputs also differ slightly from the corresponding simulated results in terms of signal amplitude and time delay.

These deviations are attributed to three main reasons: 1) lower supply voltage (and hence larger propagation delay) at IM6 due to the FIB-deposited via; 2) APF gain calibration errors due to insufficient tuning resolution (resulting from circuit parasitics and process variations in the gain tuning capacitor DACs); and 3) larger circuit propagation delays at IM6 - IM10 due to lower supply voltage (20-30 mV below other modules since they are laid out in the middle row, away from the pads). The results can be improved by 1) increasing the resolution of the capacitor DACs, and 2) reducing IR drops by improving the layout of the power supply network.

### 2) APF GAIN VS STABILITY

Fabricated chip outputs either decay or oscillate depending on the APF gain configuration, as expected from simulations. When APF gains are low, the measured output waveforms decay from left to right (going from IM3 to IM6) as shown in Fig. 12(b). Such signal decay is caused by the low-gain APFs, which act as attenuators. On the other hand, the measured chip outputs saturate when the APF gains are set to higher values.



**FIGURE 13.** Measured results from the fabricated chip, (a)-(b) acoustic density when the left boundary is excited with a (a) 1 MHz and (b) 2 MHz sinusoidal signal, respectively. (c)-(d) Acoustic velocity when the left boundary is excited with a (c) 1 MHz and (d) 2 MHz sinusoidal signal, respectively.

# C. MEASUREMENTS WITH BOUNDARY CONDITION WAVEFORMS

To study the response of the chip to different input signals, the left boundary, which defines the source boundary value of acoustic density, was excited using sinusoidal signals with varying amplitudes and frequencies. This section analyses the corresponding measured results.

### 1) ACOUSTIC DENSITY AND VELOCITY

The proposed analog solver computes both the density and the velocity of acoustic flow along the variable area duct, as described by the acoustic shock tube equations in [29] and [32] Figs. 13(a) and (c) show the corresponding space-time variation of acoustic density and velocity, respectively, when the left boundary is excited with a 1 MHz input sinusoidal signal of amplitude 150 mV. Similarly, Fig. 13 (b) and (d) show the acoustic density and velocity variation, respectively, when the input frequency is increased to 2 MHz. From the measured results, it is clear that the analog solver outputs follow the input signal, as required by the mathematical model of the acoustic shock equations [29], [32]

It is important to note that the signal amplitude at IM6 for acoustic density (Fig. 13(b)) is more attenuated than the corresponding acoustic velocity output (Fig. 13(d)). This can be traced back to the reduced supply voltage in the density computation section of IM6 caused by the FIB-deposited via.

#### 2) TUNABILITY OF MEAN FLOW COEFFICIENTS

The time-invariant coefficients  $(u_s, \rho_s, \text{ and } c_s)$  of the acoustic shock equations in [29] define the spatial profile of the acoustic density and acoustic velocity terms. These coefficients can be programmed by tuning the on-chip capacitor arrays. To study this process, we programmed the analog solver for different mean flow conditions (i.e.,



**FIGURE 14.** (a) Variation of mean flow velocity ( $u_s$ ) along the shock tube. (b) Measured acoustic density (from the chip) at IM5 with time. (c)-(d) Space time variation of acoustic density with maximum values of (c)  $u_s = 0.4$  and (d)  $u_s = 0.2$ .

different  $u_s$  variations along the shock tube) as shown in Fig. 14(a). Mean flow velocity  $u_s$  is highest at the middle of the shock tube, where the area is lowest. Maximum value of  $u_s$ , as indicated in the figure, are  $u_s = 0.7$ ,  $u_s = 0.4$ , and  $u_s =$ 0.2. Other coefficients ( $\rho_s$  and  $c_s$ ) change with  $u_s$ , but this is not shown in Fig. 14(a). The measured acoustic density at i = 5 (i.e., IM5) is shown in Fig. 14(b) for all three scenarios. Note the increase in acoustic density with the increase in  $u_s$ . A top view of the space-time plot for the acoustic density with maximum  $u_s = 0.7$  is shown in Fig. 13(b), while the corresponding acoustic density for  $u_s = 0.4$  and  $u_s = 0.2$  are shown in Figs. 14(c) and (d), respectively. As expected from simulations [29], the spatial variation of acoustic density (and velocity) becomes relatively uniform at lower mean flow velocities. Furthermore, these plots show the expected change in measured analog solver outputs due to changes in mean flow coefficients.

### 3) LINEAR VS NONLINEAR ANALOG SOLVERS

The mathematical model utilized in the proposed analog solver is nonlinear. However, the nonlinear terms can be turned off (by turning off the multipliers) to make the acoustic equations linear. Thus, we define the solver with multipliers switched off as linear and the normal operating mode (with multipliers switched on) as nonlinear. Fig. 15(a) shows the transient waveform at the output of IM5 for both linear and nonlinear cases. Figs. 15(b)-(c) compare the output power spectral density (PSD) of the two cases. Note the presence of harmonics  $(f_1 \text{ and } f_2)$  only in the nonlinear output. However, the first harmonic  $(f_1)$  is about 15 dB lower than the fundamental  $(f_0)$ , which suggests that the system is only mildly nonlinear. This fact is confirmed by the near-linear dependence of IM5 output amplitude on input signal amplitude, as shown in Fig. 15(d). For the nonlinear case, the output is linear for small input signal amplitudes (< 75 mV) because the magnitudes of the nonlinear terms

| TABLE 2.     Performance | comparison with th | he previous | ly reported | work. |
|--------------------------|--------------------|-------------|-------------|-------|
|--------------------------|--------------------|-------------|-------------|-------|

|                                             | Proposed chip | [2]         | [26]        | [20]        | [24]                       |
|---------------------------------------------|---------------|-------------|-------------|-------------|----------------------------|
| Technology                                  | TSMC 180 nm   | TSMC 65 nm  | TSMC 250 nm | TSMC 180 nm | TSMC 180 nm                |
| Supply voltage                              | 1.8 V         | 1.2 V       | 2.5 V       | 1.8 V       | 2.2 V                      |
| Power<br>consumption                        | 936 mW        | 1.2 mW      | 300 mW      | 200 mW      | 560 mW<br>(including ADCs) |
| Computation<br>bandwidth                    | 2 MHz         | 20 kHz      | 25 kHz      | 30 MHz      | 2.5 MHz<br>(without ADC)   |
| <i>F<sub>compute</sub>/power</i> per module | 0.032 MHz/mW  | 0.12 MHz/mW | 0.01 MHz/mW | 2.9 MHz/mW  | 0.042 MHz/mW               |



FIGURE 15. Measured IM5 (a) transient output and the corresponding PSD when the solver is (b) linear vs (c) nonlinear. (d) IM5 output amplitude vs input signal amplitude for the two cases, when the solver is linear and nonlinear.

 $(u^2, \rho^2, \text{ and } u\rho \text{ in the acoustic shock equations in [29]})$  are relatively small, while some nonlinearity is visible at larger input amplitudes due to the rapid growth in the nonlinear terms. Overall, the mildly nonlinear nature of the measured results can be attributed to low APF gains, which attenuate the multiplier outputs (i.e., the nonlinear terms).

### 4) TWO-TONE INPUTS

The chip can be further tested for nonlinearity by: 1) applying two different inputs; and 2) applying an input that is the sum of these two inputs. Superposition does not apply for the nonlinear system, so the sum of the two measured results from 1) can be different from the results of 2). Fig. 16(a) shows the combined measured transient results (acoustic density) when two sinusoidal inputs (0.78 MHz and 1.56 MHz) are applied separately, while Fig. 16(b) shows the results when their sum is simultaneously applied. The two results differ from each other significantly in both phase and magnitude; the measured mean RMS difference of 48.7 mV<sub>rms</sub> is 69% of the RMS value of each input.

### D. PERFORMANCE ANALYSIS

Our analog computing chip operates at a supply voltage of 1.8 V and consumes a total power of 936 mW, averaging



**FIGURE 16.** (a) Summation of measured results (acoustic density) when 1) a 0.78 MHz sinusoidal input (200 mV<sub>pp</sub>) or 2) a 1.56 MHz sinusoidal input (200 mV<sub>pp</sub>) are separately applied to the analog solver. (b) Measured results when the solver is supplied with the summed input (0.78 MHz and 1.56 MHz).

62.4 mW per module across its 15 modules. The chip achieves an analog bandwidth, denoted as  $F_{\text{compute}}$ , of 2 MHz and has an equivalent temporal update rate of 80 MHz. This high  $F_{\text{compute}}$  is a result of the propagation delay compensation technique used in the chip's architecture (APDA method), which offers advantages over integrator-based continuoustime computing methods. For instance, the analog bandwidth of our chip is 100 times higher than that of the analog computers reported in [2] and [26].

It is important to note that the compared works in [2] and [26] are more general-purpose and can be configured to solve a broader range of problems, including both linear and nonlinear systems of equations. In comparison, the analog bandwidth of our chip is 15 times lower than that of a linear PDE-solving analog computer that also employs the APDA architecture [20]. This disparity is due to the exponential increase in complexity when solving nonlinear PDEs.

To benchmark our design against other reported analog computers, we use the metric  $F_{\text{compute}}$ /power (in MHz/mW), which accounts for both the analog bandwidth and the power consumption per module. Table 2 summarizes the performance of our analog solver and compares it with previous work. Accordingly, the performance of our chip is three times better than that of [26], but approximately 3.75 times worse than [2], 1.3 times worse than [24], and about 90 times worse than [20]. The reduced performance

is attributed to our chip's ability to solve a nonlinear PDE as a coupled system of equations, which is inherently more complex than linear PDEs.

Nevertheless, because the temporal update rate of our chip is significantly higher than its analog bandwidth—by a factor of 40—it benefits from the speedup advantages over digital processors, as reported in [29].

### **VI. CONCLUSION**

Our overall objectives are to i) explore continuous-time analog computing methods [19], [20] for solving nonlinear PDEs, and ii) identify the challenges and opportunities in implementing complex analog PDE solvers. As an example, this paper has presented a CMOS implementation of a 15-point analog computer for solving acoustic shock wave equations. The chip implements SDTC algorithms for solving nonlinear PDEs using analog circuits. This paper builds upon our recently published work on the theory and circuit implementation of such analog solvers [29] by presenting and analyzing measurement results from a fabricated analog solver IC.

The proposed analog solver chip was fabricated in TSMC 180 nm CMOS with a die size of 7.38 mm×4.64 mm. Operating at a supply voltage of 1.8 V, it consumes 936 mW of power, averaging 62.4 mW per module across its 15 modules. The chip achieves an analog bandwidth of 2 MHz and an equivalent temporal update rate of 80 MHz, resulting in a performance metric ( $F_{\text{compute}}$ /power) of 0.032 MHz/mW. While this metric is lower compared to recent linear PDE-solving analog computers-which have metrics of 2.9 MHz/mW [20] and 0.042 MHz/mW [24] —it is important to note that our chip solves a nonlinear PDE as a coupled system of equations, which is inherently more complex and leads to an exponential increase in computational complexity. The chip was tested using a custom evaluation board with supporting instrumentation. An FPGA was utilized to capture the computed analog results from the analog solver, while a MATLAB-based software interface was used for postprocessing. The chip was tested over most of the desired spatial range and the factors affecting the computation accuracy were extensively analyzed using both measurements and simulations.

A key feature of the proposed analog computing method is that it absorbs the propagation delays (PDs) of the required circuit blocks (e.g., op-amps and multipliers) within the continuous-time delay required by the SDTC update equations. This in turn results in high temporal update rate and analog bandwidth compared to general-purpose digital processors, but at the cost of reduced accuracy and susceptibility to intra-die process and voltage variations. With exhaustive calibration and by tuning APF gains and delays, it is possible to restore the required accuracy of the analog solver. The identified challenges of the nonlinear analog solver and the proposed solutions described in this article will inform the design of software-defined massively parallel analog computers for solving more complex nonlinear PDEs.

### REFERENCES

- J. Zhu, B. Chen, Z. Yang, L. Meng, and T. T. Ye, "Analog circuit implementation of neural networks for in-sensor computing," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Jul. 2021, pp. 150–156.
- [2] N. Guo, Y. Huang, T. Mai, S. Patil, C. Cao, M. Seok, S. Sethumadhavan, and Y. Tsividis, "Energy-efficient hybrid analog/digital approximate computation in continuous time," *IEEE J. Solid-State Circuits*, vol. 51, no. 7, pp. 1514–1524, Jul. 2016.
- [3] R. F. Uy and V. P. Bui, "Solving ordinary and partial differential equations using an analog computing system based on ultrasonic metasurfaces," *Sci. Rep.*, vol. 13, no. 1, p. 13471, Aug. 2023, doi: 10.1038/s41598-023-38718-1.
- [4] M.-L. Wei, M. Yayla, S.-Y. Ho, J.-J. Chen, H. Amrouch, and C.-L. Yang, "Impact of non-volatile memory cells on spiking neural network annealing machine with in-situ synapse processing," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 70, no. 11, pp. 4380–4393, Nov. 2023.
- [5] M. Cotteret, O. Richter, M. Mastella, H. Greatorex, E. Janotte, W. S. Girão, M. Ziegler, and E. Chicca, "Robust spiking attractor networks with a hard winner-take-all neuron circuit," in *Proc. IEEE Int. Symp. Circuits Syst.* (*ISCAS*), May 2023, pp. 1–5.
- [6] E. Garzón, M. Lanuzza, A. Teman, and L. Yavits, "AM4: MRAM crossbar based CAM/TCAM/ACAM/AP for in-memory computing," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 13, no. 1, pp. 408–421, Mar. 2023.
- [7] L. Fick, S. Skrzyniarz, M. Parikh, M. B. Henry, and D. Fick, "Analog matrix processor for edge AI real-time video analytics," in *Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC)*, vol. 65, Feb. 2022, pp. 260–262.
- [8] I. Yeo, W. He, Y.-C. Luo, S. Yu, and J.-S. Seo, "A dynamic poweronly compute-in-memory macro with power-of-two nonlinear SAR ADC for nonvolatile ferroelectric capacitive crossbar array," *IEEE Solid-State Circuits Lett.*, vol. 7, pp. 70–73, 2024.
- [9] B. Zhang, J. Saikia, J. Meng, D. Wang, S. Kwon, S. Myung, H. Kim, S. J. Kim, J.-S. Seo, and M. Seok, "A 177 TOPS/W, capacitor-based in-memory computing SRAM macro with stepwise-charging/discharging DACs and sparsity-optimized bitcells for 4-bit deep convolutional neural networks," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2022, pp. 1–2.
- [10] X. Yang, K. Zhu, X. Tang, M. Wang, M. Zhan, N. Lu, J. P. Kulkarni, D. Z. Pan, Y. Liu, and N. Sun, "An in-memory-computing charge-domain ternary CNN classifier," *IEEE J. Solid-State Circuits*, vol. 58, no. 5, pp. 1450–1461, May 2023.
- [11] M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. R. Nandakumar, P. Narayanan, H. Tsai, G. W. Burr, A. Sebastian, and V. Narayanan, "Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators," *Nature Commun.*, vol. 14, no. 1, p. 5282, Aug. 2023, doi: 10.1038/s41467-023-40770-4.
- [12] A. Singh and B.-G. Lee, "Framework for in-memory computing based on memristor and memcapacitor for on-chip training," *IEEE Access*, vol. 11, pp. 112590–112599, 2023.
- [13] J. Jang, S. Gi, I. Yeo, S. Choi, S. Jang, S. Ham, B. Lee, and G. Wang, "A learning-rate modulable and reliable TiO<sub>x</sub> memristor array for robust, fast, and accurate neuromorphic computing," *Adv. Sci.*, vol. 9, no. 22, Aug. 2022, Art. no. 2201117, doi: 10.1002/advs.202201117.
- [14] S. Diware, K. Chilakala, R. V. Joshi, S. Hamdioui, and R. Bishnoi, "Reliable and energy-efficient diabetic retinopathy screening using memristorbased neural networks," *IEEE Access*, vol. 12, pp. 47469–47482, 2024.
- [15] Aspinity. (2023). Analog Memory for Efficient AI Compute. Accessed: Mar. 3, 2024. [Online]. Available: https://www.aspinity.com/blog-itemanalog-memory-for-efficient-ai-compute
- [16] IEEE Spectrum. (2024). IBM's AI Chip May Find Use in Generative AI. Accessed: Mar. 3, 2024. [Online]. Available: https://spectrum. ieee.org/analog-ai-ibm
- [17] S Ambrogio, P. Narayanan, A. Okazaki, A. Fasoli, C. Mackin, K. Hosokawa, A. Nomura, T. Yasuda, A. Chen, A. Friz, and M. Ishii, "An analog-AI chip for energy-efficient speech recognition and transcription," *Nature*, vol. 620, no. 7975, pp. 768–775, Aug. 2023, doi: 10.1038/s41586-023-06337-5.
- [18] Y. Tsividis, "Not your father's analog computer," *IEEE Spectr.*, vol. 55, no. 2, pp. 38–43, Feb. 2018.

- [19] N. Udayanga, S. I. Hariharan, S. Mandal, L. Belostotski, L. T. Bruton, and A. Madanayake, "Continuous-time algorithms for solving Maxwell's equations using analog circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 10, pp. 3941–3954, Oct. 2019.
- [20] N. Udayanga, A. Madanayake, S. I. Hariharan, J. Liang, S. Mandal, L. Belostotski, and L. T. Bruton, "A radio frequency analog computer for computational electromagnetics," *IEEE J. Solid-State Circuits*, vol. 56, no. 2, pp. 440–454, Feb. 2021.
- [21] N. Guo, Y. Huang, T. Mai, S. Patil, C. Cao, M. Seok, S. Sethumadhavan, and Y. Tsividis, "Continuous-time hybrid computation with programmable nonlinearities," in *Proc. 41st Eur. Solid-State Circuits Conf. (ESSCIRC)*, Sep. 2015, pp. 279–282.
- [22] Y. Huang, N. Guo, M. Seok, Y. Tsividis, K. Mandli, and S. Sethumadhavan, "Hybrid analog-digital solution of nonlinear partial differential equations," in *Proc. 50th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO)*, Oct. 2017, pp. 665–678.
- [23] Y. Huang, N. Guo, S. Sethumadhavan, M. Seok, and Y. Tsividis, "A case study in analog co-processing for solving stochastic differential equations," in *Proc. IEEE 23rd Int. Conf. Digit. Signal Process. (DSP)*, Nov. 2018, pp. 1–5.
- [24] J. Liang, N. Udayanga, A. Madanayake, S. I. Hariharan, and S. Mandal, "An offset-cancelling discrete-time analog computer for solving 1-D wave equations," *IEEE J. Solid-State Circuits*, vol. 56, no. 9, pp. 2881–2894, Sep. 2021.
- [25] J. Liang, X. Tang, S. I. Hariharan, A. Madanayake, and S. Mandal, "A current-mode discrete-time analog computer for solving Maxwell's equations in 2D," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2023, pp. 1–5.
- [26] G. E. R. Cowan, R. C. Melville, and Y. P. Tsividis, "A VLSI analog computer/digital computer accelerator," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 42–53, Jan. 2006.
- [27] J.-O. Seo, M. Seok, and S. Cho, "ARCHON: A 332.7TOPS/W 5b variation-tolerant analog CNN processor featuring analog neuronal computation unit and analog memory," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 65, Feb. 2022, pp. 258–260.
- [28] Z. Chen, X. Chen, and J. Gu, "A 65 nm 3T dynamic analog RAMbased computing-in-memory macro and CNN accelerator with retention enhancement, adaptive analog sparsity and 44TOPS/W system energy efficiency," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 2021, pp. 240–242.
- [29] H. Malavipathirana, S. I. Hariharan, N. Udayanga, S. Mandal, and A. Madanayake, "A fast and fully parallel analog CMOS solver for nonlinear PDEs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 68, no. 8, pp. 3363–3376, Aug. 2021.
- [30] C. Wijenayake, Y. Xu, A. Madanayake, L. Belostotski, and L. T. Bruton, "RF analog beamforming fan filters using CMOS all-pass time delay approximations," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 5, pp. 1061–1073, May 2012.
- [31] P. Ahmadi, B. Maundy, A. S. Elwakil, L. Belostotski, and A. Madanayake, "A new second-order all-pass filter in 130-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 3, pp. 249–253, Mar. 2016.
- [32] S. I. Hariharan and H. C. Lester, "Acoustic shocks in a variable area duct containing near sonic flows," *J. Comput. Phys.*, vol. 58, no. 1, pp. 134–145, Mar. 1985.
- [33] R. MacCormack, "Numerical solution of the interaction of a shock wave with a laminar boundary layer," in *Proc. 2nd Int. Conf. Numer. Methods Fluid Dyn.* (Lecture Notes in Physics), vol. 8. Berlin, Germany: Springer, Apr. 1971, pp. 151–163.
- [34] Xilinx. Zynq UltraScale+ RFSoC. Accessed: May 2022. [Online]. Available: https://www.xilinx.com/products/silicon-devices/soc/rfsoc.html
- [35] Xilinx. Zynq UltraScale+ RFSoC RF Data Converter V2.5 Gen. Accessed: May 2022. [Online]. Available: https://docs.xilinx.com/v/u/2.5-English/pg269-RF-data-converter
- [36] Xilinx. Vivado Design Suite User Guide: Programming and Debugging. Accessed: May 2022. [Online]. Available: https://docs.xilinx.com/r/en-US/ug908-vivado-programming-debugging/ILA
- [37] J. C. Spall, "Implementation of the simultaneous perturbation algorithm for stochastic optimization," *IEEE Trans. Aerosp. Electron. Syst.*, vol. 34, no. 3, pp. 817–823, Jul. 1998.
- [38] J. C. Spall, "An overview of the simultaneous perturbation method for efficient optimization," *Johns Hopkins Apl Tech. Dig.*, vol. 19, no. 4, pp. 482–492, 1998.

- [39] P. D. Prewett, "Focused ion beams in microfabrication (invited)," *Rev. Sci. Instrum.*, vol. 63, no. 4, pp. 2364–2366, Apr. 1992.
- [40] A. J. DeMarco and J. Melngailis, "Contact resistance of focused ion beam deposited platinum and tungsten films to silicon," J. Vac. Sci. Technol. B, Microelectron. Nanometer Struct. Process., Meas., Phenomena, vol. 19, no. 6, pp. 2543–2546, Nov. 2001.



**HASANTHA MALAVIPATHIRANA** (Member, IEEE) received the B.Sc. degree in electronic and telecommunication engineering from the University of Moratuwa, Sri Lanka, in 2016, and the Ph.D. degree in electrical engineering from Florida International University (FIU), Miami, FL, USA, in 2022. She specializes in analog and superconducting IC design. Her research interests include analog computing, multidimensional signal processing, and superconducting circuit design.



**SOUMYAJIT MANDAL** (Senior Member, IEEE) received the B.Tech. degree from the Indian Institute of Technology (IIT), Kharagpur, India, in 2002, and the S.M. and Ph.D. degrees in electrical engineering from Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, in 2004 and 2009, respectively. He was a Research Scientist with Schlumberger-Doll Research, Cambridge, from 2010 to 2014; an Assistant Professor with the Department of Electrical Engineering and

Computer Science, Case Western Reserve University, Cleveland, OH, USA, from 2014 to 2019; and an Associate Professor with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA, from 2019 to 2021. He is currently a Research Staff Member with the Instrumentation Division, Brookhaven National Laboratory, Upton, NY, USA. He has more than 175 publications in peer-reviewed journals and conferences and has been awarded 26 patents. His research interests include analog and biological computation, magnetic resonance sensors, low-power analog and RF circuits, and precision instrumentation for various biomedical and sensor interface applications. He was a recipient of the President of India Gold Medal, in 2002, the MIT Microsystems Technology Laboratories (MTL) Doctoral Dissertation Award, in 2009, the T. Keith Glennan Fellowship, in 2016, and the IIT Kharagpur Young Alumni Achiever Award, in 2018.



NILAN UDAYANGA (Member, IEEE) received the B.Sc. degree in electronics and telecommunication engineering from the University of Moratuwa, Moratuwa, Sri Lanka, in 2011, the M.Sc. degree from The University of Akron, Akron, OH, USA, in 2015, and the Ph.D. degree from Florida International University, Miami, FL, USA, in 2019. He was a Postdoctoral Scholar with the Department of Electrical and Computer Engineering, University of Southern California,

Los Angeles, CA, USA, from 2020 to 2021. His research interests include analog and radio frequency integrated circuit designs for biomedical applications, analog computing, multidimensional signal processing, antenna arrays, and field-programmable gate array (FPGA)-based system designs.



**YINGYING WANG** (Member, IEEE) received the B.S. degree from Beihang University, China, the M.S. degree in electrical engineering from Caltech, in 2008, and the Ph.D. degree in electrical engineering from Case Western Reserve University, Cleveland, OH, USA, in 2020. She was with Forza Silicon, Los Angeles, CA, USA, as a Mixed-Signal Design Engineer, from 2008 to 2013. Her research interests include mixed-signal and RF integrated circuit design.



**ARJUNA MADANAYAKE** (Member, IEEE) received the B.Sc. degree (Hons.) in electronic and telecommunication engineering from the University of Moratuwa, Sri Lanka, in 2002, and the M.Sc. and Ph.D. degrees in electrical engineering from the University of Calgary, Canada, in 2004 and 2008, respectively. He is a tenured Associate Professor of electrical and computer engineering with the Department of Electrical and Computer Engineering, Florida

International University (FIU), Miami. Before joining FIU as an Associate Professor, in 2018, he was a tenured Associate Professor with the University of Akron, Ohio. His research interests include RF, analog, digital signal processing, circuits and electronics, FPGA systems, RF AI and machine learning, digital signal processors, multidimensional systems and signal processing, wireless and mm-wave systems, radar and electronic warfare systems, and communication systems. His research has recently been supported by ten awards from NSF, three awards from DARPA, three awards from ONR, and other support from NASA, NIH, Lockheed Martin, Mitsubishi Electric Research Labs, VirtualEM, and the Venture Capital community. He is an Active Member of the IEEE Circuits and Systems Education and Outreach (CASEO) Technical Committee and the IEEE Technical Committee on DSP.

. . .



**S. I. HARIHARAN** (Life Senior Member, IEEE) received the B.Sc. degree in mathematics from the University of Sri Lanka, Peradeniya, Sri Lanka, in 1975, the M.Sc. degree in computational methods and fluid mechanics from the University of Salford, Salford, U.K., in 1978, and the M.S. and Ph.D. degrees in mathematics from Carnegie Mellon University, Pittsburgh, PA, USA, in 1979 and 1980, respectively. From 1980 to 1983, he was a Staff Scientist with the former Institute for

Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA, USA. From 1983 to 1985, he was an Assistant Professor of mathematics with the University of Tennessee Space Institute, Tullahoma, TN, USA. Since 1985, he has been an Associate/Full/Emeritus Professor of electrical and computer engineering (ECE) with The University of Akron, Akron, OH, USA. He was also the Program Director of applied mathematics with the National Science Foundation, from 1995 to 1997, and the Associate Dean of Graduate Studies with the College of Engineering, The University of Akron, His primary research interests include computational electromagnetics and aeroacoustics, signal processing applications, and modeling and simulations in materials science. He served on the Editorial Board of *SIAM Journal on Applied Mathematics*, from 1997 to 2006.