Technical Challenges for Big Data
The term big data is used to describe data sets that are too large or complex to be worked with using commonly available tools [1]. Climate science represents a big data domain that is experiencing unprecedented growth [2]. Some of the major big data technical challenges facing climate science are easy to understand.
Large repositories mean that the data sets themselves cannot easily be moved; instead, analytical operations must migrate to where the data reside.
Complex analyses over large repositories require high-performance computing.
Large amounts of information increase the importance of metadata, provenance management, and discovery.
Migrating codes and analytic products within a growing network of storage and computational resources creates a need for fast networks, intermediation, and resource balancing.
Importantly, the ability to respond quickly to customer demands for new and often unanticipated uses for climate data requires greater agility in building and deploying applications [3].