Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments | IEEE Conference Publication | IEEE Xplore

Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments


Abstract:

Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to proc...Show More

Abstract:

Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to process the ever-increasing explosion of data. Generally, these systems are developed as single projects with aspects such as communication, task management, and data management integrated together. By contrast, we take a component-based approach to big data by developing the essential features of a big data system as independent components with polymorphic implementations to support different requirements. Consequently, we recognize the requirements of both dataflow used in popular Apache Systems and the Bulk Synchronous Processing communication style common in High-Performance Computing (HPC) for different applications. Message Passing Interface (MPI) implementations are dominant in HPC but there are no such standard libraries available for big data. Twister:Net is a stand-alone, highly optimized dataflow style parallel communication library which can be used by big data systems or advanced users. Twister:Net can work both in cloud environments using TCP or HPC environments using MPI implementations. This paper introduces Twister:Net and compares it with existing systems to highlight its design and performance.
Date of Conference: 02-07 July 2018
Date Added to IEEE Xplore: 11 September 2018
ISBN Information:
Electronic ISSN: 2159-6190
Conference Location: San Francisco, CA, USA

I. Introduction

Many prominent big data systems exist today for the purpose of processing the enormous wealth of data available in terms of velocity, volume, and veracity. Streaming processing and batch data processing are the dominant forms of big data analytics, with Function as a Service (FaaS) emerging as a new paradigm. Systems such as Spark [1] and Hadoop primarily focus on batch data, while Heron [2], Flink [3], and Storm target streaming data. As opposed to these systems, the high performance computing (HPC) community uses Message Passing Interface (MPI) and its implementations as their framework of choice for large-scale parallel applications.

Contact IEEE to Subscribe

References

References is not available for this document.