I. Introduction
For important applications, it is well known that specialized hardware accelerators can provide better performance and power-efficiency than running software on general purpose processors. The use of specialized hardware is even more attractive now that power is a primary constraint in chip design. Recent work has developed accelerators for applications such as web search [1], neuromorphic computing [2], radix sort [3], and molecular dynamics [4]. The specialized hardware can be a stand-alone processor (e.g., like a GPU), a co-processor, or even a special functional unit.