I. Introduction
Polystores [1], [4], [10] combine a set of specialized data processing engines (e.g., graph engines, dataflow engines, array databases) in order to perform data analysis at scale, using each specialized engine according to its characteristics. Each engine uses its own format and storage location for the data it processes, often making data migration between those data processing engines the bottleneck in processing that data. Thus, the decision on whether two different execution engines are used for a single data pipeline, depends on whether the overhead of data migration is smaller than the speedup caused by a specialized execution engine.