I. Introduction
In order to store and manage data in an efficient way, database management system (DBMS) was conceptualized by industry in the 1960s and developed the early DBMS from file system. With the expansion of applications for handling more complex data model, more complex query operation and more extensive database, gradually, distributed database was developed, parallel database, database cluster, data warehouse and so on. All of them were met the application requirements in some ways, but the support of massive data are still not perfect. At the same time, Internet becomes a huge repository of information and vast amounts of data generated every day. If want to retrieve the data, you need massive storage space and large scale computing, which existing database system cannot afford, and large server has issues such as limited performance and high price. Therefore, Google established a large scale cluster using a lot of cheap PC, designed and implemented a distributed file system named GFS [1], a storage system named BigTable [2] and a parallel programming environment (MapReduce [3] [4]), Which constitutes Google's “cloud computing” environment, other companies have also proposed and implemented similar cloud system, including Amazon's EC2[5], S3 [6], and IBM's “Blue Cloud” and so on.