Conferences >2018 IEEE 11th International ...

Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to proc...Show More

Metadata

Abstract:

Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to process the ever-increasing explosion of data. Generally, these systems are developed as single projects with aspects such as communication, task management, and data management integrated together. By contrast, we take a component-based approach to big data by developing the essential features of a big data system as independent components with polymorphic implementations to support different requirements. Consequently, we recognize the requirements of both dataflow used in popular Apache Systems and the Bulk Synchronous Processing communication style common in High-Performance Computing (HPC) for different applications. Message Passing Interface (MPI) implementations are dominant in HPC but there are no such standard libraries available for big data. Twister:Net is a stand-alone, highly optimized dataflow style parallel communication library which can be used by big data systems or advanced users. Twister:Net can work both in cloud environments using TCP or HPC environments using MPI implementations. This paper introduces Twister:Net and compares it with existing systems to highlight its design and performance.

Published in: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD)

Date of Conference: 02-07 July 2018

Date Added to IEEE Xplore: 11 September 2018

ISBN Information:

Electronic ISSN: 2159-6190

DOI: 10.1109/CLOUD.2018.00055

Conference Location: San Francisco, CA, USA

Contents

I. Introduction

Many prominent big data systems exist today for the purpose of processing the enormous wealth of data available in terms of velocity, volume, and veracity. Streaming processing and batch data processing are the dominant forms of big data analytics, with Function as a Service (FaaS) emerging as a new paradigm. Systems such as Spark [1] and Hadoop primarily focus on batch data, while Heron [2], Flink [3], and Storm target streaming data. As opposed to these systems, the high performance computing (HPC) community uses Message Passing Interface (MPI) and its implementations as their framework of choice for large-scale parallel applications.

References is not available for this document.

MIT Libraries

MIT Libraries

Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Twister:Net - Communication Library for Big Data Processing in HPC and Cloud Environments

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?