Loading [MathJax]/extensions/MathZoom.js
A Big Data Modeling Methodology for Apache Cassandra | IEEE Conference Publication | IEEE Xplore

A Big Data Modeling Methodology for Apache Cassandra


Abstract:

Apache Cassandra is a leading distributed database of choice when it comes to big data management with zero downtime, linear scalability, and seamless multiple data cente...Show More

Abstract:

Apache Cassandra is a leading distributed database of choice when it comes to big data management with zero downtime, linear scalability, and seamless multiple data center deployment. With increasingly wider adoption of Cassandra for online transaction processing by hundreds of Web-scale companies, there is a growing need for a rigorous and practical data modeling approach that ensures sound and efficient schema design. This work i) proposes the first query-driven big data modeling methodology for Apache Cassandra, ii) defines important data modeling principles, mapping rules, and mapping patterns to guide logical data modeling, iii) presents visual diagrams for Cassandra logical and physical data models, and iv) demonstrates a data modeling tool that automates the entire data modeling process.
Date of Conference: 27 June 2015 - 02 July 2015
Date Added to IEEE Xplore: 20 August 2015
ISBN Information:
Print ISSN: 2379-7703
Conference Location: New York, NY, USA

I. Introduction

Apache Cassandra [1], [2] is a leading transactional, scal-able, and highly-available distributed database. It is known to manage some of the world's largest datasets on clusters with many thousands of nodes deployed across multiple data centers. Cassandra data management use cases include product catalogs and playlists, sensor data and Internet of Things, messaging and social networking, recommendation, personal-ization, fraud detection, and numerous other applications that deal with time series data. The wide adoption of Cassandra [3] in big data applications is attributed to, among other things, its scalable and fault-tolerant peer-to-peer architecture [4], versatile and flexible data model that evolved from the BigTable data model [5], declarative and user-friendly Cassandra Query Language (CQL), and very efficient write and read access paths that enable critical big data applications to stay always on, scale to millions of transactions per second, and handle node and even entire data center failures with ease. One of the biggest challenges that new projects face when adopting Cassandra is data modeling that has significant differences from traditional data modeling approaches used in the past.

Contact IEEE to Subscribe

References

References is not available for this document.