Domain-specific accelerators obtain performance benefits by restricting their algorithmic domain. These accelerators utilize specialized languages constrained to particul...Show More
Metadata
Abstract:
Domain-specific accelerators obtain performance benefits by restricting their algorithmic domain. These accelerators utilize specialized languages constrained to particular hardware, thus trading off expressiveness for high performance. The pendulum has swung from one hardware for all domains (general-purpose processors) to one hardware per individual domain. The middle-ground on this spectrum–which provides a unified computational stack across multiple, but not all, domains– is an emerging and open research challenge. This paper sets out to explore this region and its associated tradeoff between expressiveness and performance by defining a cross-domain stack, dubbed PolyMath. This stack defines a high-level cross-domain language (CDL), called PMLang, that in a modular and reusable manner encapsulates mathematical properties to be expressive across multiple domains–Robotics, Graph Analytics, Digital Signal Processing, Deep Learning, and Data Analytics. PMLang is backed by a recursively-defined intermediate representation allowing simultaneous access to all levels of operation granularity, called sr DFG. Accelerator-specific or domain-specific IRs commonly capture operations in the granularity that best fits a set of Domain-Specific Architectures (DSAs). In contrast, the recursive nature of the sr DFG enables simultaneous access to all the granularities of computation for every operation, thus forming an ideal bridge for converting to various DSA-specific IRs across multiple domains. Our stack unlocks multi-acceleration for end-to-end applications that cross the boundary of multiple domains each comprising different data and compute patterns. Evaluations show that by using PolyMath it is possible to harness accelerators across the five domains to realize an average speedup of 3.3× over a Xeon CPU along with 18.1× reduction in energy. In comparison to Jetson Xavier and Titan XP, cross-domain acceleration offers 1.7× and 7.2× improvement in performance-per-watt, respec...