I. Introduction
Sustained scaling of semiconductor process technology, coupled with leakage-limited supply voltage scaling, has led to transistors continuing to become cheaper, while energy budgets come under ever increasing pressure [1]. In light of this, the use of datapath accelerators is increasingly favorable, since specialized hardware is the most efficient use of surplus silicon area in an energy-constrained application. A wide range of power- and performance-sensitive applications have been demonstrated to benefit from custom accelerator blocks, such as network processors [2], smartphone processors [3], [4], and sensor nodes [5], [6]. This trend is particularly apparent in digital signal-processing (DSP) applications, where the unrelenting demands of wireless and multimedia workloads often necessitate specialized datapath hardware. Dedicated accelerator blocks can be specifically optimized and thus typically achieve significantly higher energy efficiency compared with software-only solutions. This comes at the cost of increased development time and limited flexibility.