I. Introduction
Centralized cloud systems [1], [2] are the de-facto platforms for executing large-scale data analysis in many domains. The convenience of cloud systems is attractive - a user can simply request resources (e.g. storage space or machines) as needed. No physical hardware needs to be managed, and typically minimal software setup is required. Yet, due to the centralized nature of cloud systems, data must be brought into a central location for processing which is inappropriate for many data analytic applications whose data itself is produced in a distributed fashion around the world. For example, content distribution networks (CDN) and large web services produce logs in a large number of sites distributed over the globe which must then be processed for anomaly detection, billing users, or for other kinds of analysis.