I. Introduction
The rapid growth of deep learning in the past decade has created an immense demand for computing power at both the cloud and edge. Multiple algorithmic, architectural, and circuit approaches have been proposed to meet this demand. Among those, stochastic computing (SC) has been enjoying a renaissance in deep learning acceleration for latency-, energy-, and cost-constrained devices [1]–[5]. It offers a very compact computing footprint, enabling high levels of parallelism and data reuse not achievable using conventional floating- or fixed-point architectures [5]. Its approximate nature synergizes well with neural networks' inherent error-tolerant properties, enabling new axes of accuracy and performance tradeoffs [3], [5], [6].