Area-Efficient Distributed Arithmetic Optimization via Heuristic Decomposition and In-Memroy Computing | IEEE Conference Publication | IEEE Xplore

Area-Efficient Distributed Arithmetic Optimization via Heuristic Decomposition and In-Memroy Computing


Abstract:

Distributed arithmetic (DA) is popularly adopted in many digital signal processing (DSP) applications, such as filtering, linear transformations and convolutions, with bo...Show More

Abstract:

Distributed arithmetic (DA) is popularly adopted in many digital signal processing (DSP) applications, such as filtering, linear transformations and convolutions, with both area and energy benefits. DA utilizes Look-Up Tables (LUTs) that are implemented with SRAM to store all possible precomputed results. However, a direct implementation will lead to exponential LUT size increase with respect to the vector size. In this paper, we propose a novel in-memory computation design methodology to reduce the size of LUT without degrading the speed and power performance heavily. First, we propose a heuristic decomposition scheme that only leads to a minimal subset of the precomputed results to be stored in LUT. Second, we design a novel multibit in-memory adder exploiting charge-sharing based carry propagation. In the design case, when applying our method to the state-of-the-art DA-based FIR, the overall area is reduced by 10% while maintaining same speed and a similar level of energy.
Date of Conference: 29 October 2019 - 01 November 2019
Date Added to IEEE Xplore: 06 February 2020
ISBN Information:

ISSN Information:

Conference Location: Chongqing, China

I. Introduction

The decentralized edge computing advocates equipping proper computational capabilities to the edge devices where data are generated locally [1]. And it is often a challenging task to deploy computational demanding algorithms on these parsimonious edge nodes, such as inner-product or sum-of-product computation that are widely used for ‘in-node’ signal pre-processing, conditioning, feature extraction tasks, etc. Distributed Arithmetic is a promising design alternative for achieving bit-serial, multiplier-less implementation of inner-product computation [2] with reduced area and power consumption, as compared to parallel, multiply-accumulator based implementation. A few DA design examples have been demonstrated, such as finite impulse response (FIR) filter [3], discrete cosine transform [4], convolution [5], etc. However, a major hurdle for employing DA is that the LUT size will increase exponentially with the length of inner-product vectors.

Contact IEEE to Subscribe

References

References is not available for this document.