I. Introduction
The maturity of the Hadoop Distributed File System (HDFS) [1], a key component of the Hadoop Ecosystem for Big Data processing [2], has led to its integration as a data storage and management component in many research and industry projects that also require complex analytics on data exposed via interactions with HDFS. Traditional data analytic platforms such as R [3], by default, do not directly support these interactions. One solution is to develop the new statistical programming libraries [4], [5] that are native to the Hadoop ecosystem and can communicate directly with HDFS. A limitation of these new toolkits is the lack of capabilities and versatilities that come from years of community contribution. Another approach is to design and implement frameworks that integrate existing data analytic platforms into the Hadoop ecosystem. The goal of the integration is two fold. First, the frameworks are to provide users with large-scale data access through HDFS while retaining the analytical capability and familiarity of the analytic platform. The second goal, equally important, is to take advantage of Hadoop's intehenrent parallelism capability.