I. Introduction
The storage demands of cloud computing have been growing exponentially year after year. Rather than relying on traditional central large storage arrays, the storage system for cloud computing consolidates large numbers of distributed commodity computers into a single storage pool, and provides a large capacity and high performance storage service in an unreliable and dynamic network environment at low cost. To build such a cloud storage system, an increasing number of companies and academic institutions have started to rely on the Hadoop Distributed File System (HDFS) [1]. HDFS provides reliable storage and high throughput access to application data. It is suitable for applications that have large data sets, typically the Map/Reduce programming framework [2] for data-intensive computing. HDFS has been widely used and become a common storage appliance for cloud computing.