Loading [a11y]/accessibility-menu.js
ERMS: An Elastic Replication Management System for HDFS | IEEE Conference Publication | IEEE Xplore

ERMS: An Elastic Replication Management System for HDFS

Publisher: IEEE

Abstract:

The Hadoop Distributed File System (HDFS) is a distributed storage system that stores large-scale data sets reliably and streams those data sets to applications at high b...View more

Abstract:

The Hadoop Distributed File System (HDFS) is a distributed storage system that stores large-scale data sets reliably and streams those data sets to applications at high bandwidth. HDFS provides high performance, reliability and availability by replicating data, typically three copies of every data. The data in HDFS changes in popularity over time. To get better performance and higher disk utilization, the replication policy of HDFS should be elastic and adapt to data popularity. In this paper, we describe ERMS, an elastic replication management system for HDFS. ERMS provides an active/standby storage model for HDFS. It utilizes a complex event processing engine to distinguish real-time data types, and then dynamically increases extra replicas for hot data, cleans up these extra replicas when the data cool down, and uses erasure codes for cold data. ERMS also introduces a replica placement strategy for the extra replicas of hot data and erasure coding parities. The experiments show that ERMS effectively improves the reliability and performance of HDFS and reduce storage overhead.
Date of Conference: 24-28 September 2012
Date Added to IEEE Xplore: 20 November 2012
ISBN Information:
Publisher: IEEE
Conference Location: Beijing, China

I. Introduction

The storage demands of cloud computing have been growing exponentially year after year. Rather than relying on traditional central large storage arrays, the storage system for cloud computing consolidates large numbers of distributed commodity computers into a single storage pool, and provides a large capacity and high performance storage service in an unreliable and dynamic network environment at low cost. To build such a cloud storage system, an increasing number of companies and academic institutions have started to rely on the Hadoop Distributed File System (HDFS) [1]. HDFS provides reliable storage and high throughput access to application data. It is suitable for applications that have large data sets, typically the Map/Reduce programming framework [2] for data-intensive computing. HDFS has been widely used and become a common storage appliance for cloud computing.

References

References is not available for this document.