Big Data Challenges in Climate Science: Improving the next-generation cyberinfrastructure | IEEE Journals & Magazine | IEEE Xplore

Big Data Challenges in Climate Science: Improving the next-generation cyberinfrastructure


Abstract:

The knowledge we gain from research in climate science depends on the generation, dissemination, and analysis of high-quality data. This work comprises technical practice...Show More

Abstract:

The knowledge we gain from research in climate science depends on the generation, dissemination, and analysis of high-quality data. This work comprises technical practice as well as social practice, both of which are distinguished by their massive scale and global reach. As a result, the amount of data involved in climate research is growing at an unprecedented rate. Some examples of the types of activities that increasingly require an improved cyberinfrastructure for dealing with large amounts of critical scientific data are climate model intercomparison (CMIP) experiments; the integration of observational data and climate reanalysis data with climate model outputs, as seen in the Observations for Model Intercomparison Projects (Obs4MIPs), Analysis for Model Intercomparison Projects (Ana4MIPs), and Collaborative Reanalysis Technical Environment-Intercomparison Project (CREATE-IP) activities; and the collaborative work of the Intergovernmental Panel on Climate Change (IPCC). This article provides an overview of some of climate science's big data problems and the technical solutions being developed to advance data publication, climate analytics as a service, and interoperability within the Earth System Grid Federation (ESGF), which is the primary cyberinfrastructure currently supporting global climate research activities.
Published in: IEEE Geoscience and Remote Sensing Magazine ( Volume: 4, Issue: 3, September 2016)
Page(s): 10 - 22
Date of Publication: 16 September 2016

ISSN Information:

PubMed ID: 31709380
References is not available for this document.

Technical Challenges for Big Data

The term big data is used to describe data sets that are too large or complex to be worked with using commonly available tools [1]. Climate science represents a big data domain that is experiencing unprecedented growth [2]. Some of the major big data technical challenges facing climate science are easy to understand.

Large repositories mean that the data sets themselves cannot easily be moved; instead, analytical operations must migrate to where the data reside.

Complex analyses over large repositories require high-performance computing.

Large amounts of information increase the importance of metadata, provenance management, and discovery.

Migrating codes and analytic products within a growing network of storage and computational resources creates a need for fast networks, intermediation, and resource balancing.

Importantly, the ability to respond quickly to customer demands for new and often unanticipated uses for climate data requires greater agility in building and deploying applications [3].

Select All
1.
C. Snijders, U. Matzat and U. D. Reips, "Big data: Big gaps of knowledge in the field of Internet science", Int. J. Internet Sci., vol. 7, no. 1, pp. 1-5, 2012.
2.
P. N. Edwards, A Vast Machine: Computer Models Climate Data and the Politics of Global Warming, MIT Press, 2010.
3.
J. L. Schnase, D. Q. Duffy, G. S. Tamkin, D. Nadeau, J. H. Thompson, C. M. Grieg, et al., "MERRA analytic services: Meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service".
4.
S. L. Star and , "The politics of formal representations: Wizards gurus and organizational complexity" in Ecologies of Knowledge: Work and Politics in Science and Technology, SUNY Press, pp. 88-118, 1995.
5.
J. L. Schnase, M. A. Lane, B. C. Bowker, S. L. Star and A. Silberschatz, "Building the next generation biological information infrastructure", pp. 291-300, 1997.
6.
C. A. Mattmann, C. S. Lynnes, L. Cinquini, P. M. Ramirez, A. F. Hart, D. Williams, et al., "Next generation cyberinfrastructure to support comparison of satellite observations with climate models", pp. 82-85, 2014.
7.
G. L. Potter, T. J. Lee and L. Carriere, "Improving access to climate model observational and reanalysis data", pp. 86-89, 2014.
8.
J. L. Schnase, D. Q. Duffy, M. A. McInerney, W. P. Webster and T. J. Lee, "Climate analytics as a service", pp. 90-94, 2014.
9.
J. T. Overpeck, G. A. Meehl, S. Bony and D. R. Easterling, "Climate data challenges in the 21st century", Science, vol. 331, pp. 700-702, Feb. 2011.
10.
Intergovernmental Panel on Climate Change (IPCC), [online] Available: http://www.ipcc.ch/organization/organization.shtml.
11.
Climate Model Intercomparison Project (CMIP), [online] Available: http://cmip-pcmdi.llnl.gov.
12.
K. E. Taylor, R. J. Stouffer and G. A. Meehl, "An overview of CMIP5 and the experimental design", Bull. Amer. Meteor. Soc, vol. 93, no. 4, pp. 485-498, 2012.
13.
Earth System Grid Federation (ESGF), [online] Available: http://esgf.llnl.gov.
14.
L. Cinquini, D. Crichton, C. Mattmann, J. Harney, G. Shipman, F. Wang, et al., "The Earth System Grid Federation: An open infrastructure for access to distributed geospatial data", Futur. Gener. Comput. Syst, vol. 36, pp. 400-417, July 2014.
15.
P. Gleckler, R. Ferraro and D. Waliser, "Improving use of satellite datainevaluating climate models", EOS Trans. Am. Geophys. Union, vol. 92, no. 20, pp. 172, 2011.
16.
M. G. Bosilovich, A. H. Chaudhuri and M. Rixen, "Earth system reanalysis: Progress challenges and opportunities", Bull. Am. Meteorol. Soc, vol. 94, no. 8, pp. 110-113, 2013.
17.
K. H. Rosenlof, L. Terray, C. Deser, A. Clement, H. Goosee, S. Davis, et al., "Changes in variability associated with climate change" in Climate Science for Serving Society, Springer, pp. 249-271, 2013.
18.
C. Mattmann, A. Braverman and D. Crichton, "Understanding architectural tradeoffs necessary to increase climate model intercomparison efficiency", ACM SIGSOFT Soft. Eng. Notes, vol. 35, no. 3, pp. 1-6, 2010.
19.
C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes, "A software architecture-based framework for highly distributed and data intensive scientific applications", pp. 721-730, 2006.
20.
Analytics and Informatics Management Systems (AIMS), [online] Available: http://aims.llnl.gov/mission.html.
21.
C. Pagé, S. Joussaume, M. Juckes, W. S. de Cerff, M. Pleiger, E. de Vreede, et al., "Providing and facilitating climate model data access in Europe: IS-ENES and CLIPC initiatives", 2014, [online] Available: http://meetingorganizer.copernicus.org/EMS2014/EMS2014-265-1.pdf.
22.
S. Fiore, A. D'Anca, C. Palazzo, I. Foster, D. N. Williams and G. Aloisio, "Ophidia: Toward big data analytics for eScience", Procedia Computer Sci., vol. 18, pp. 2376-2285, June 2013.
23.
J. L. Schnase, "Climate analytics as a service", 2014, [online] Available: http://www.ecmwf.int/sites/default/files/COP-CDS-WS-Schnase.pdf.
24.
J. L. Schnase, N. Merati, C. Yang, M. Yuan and , "Climate analytics as a service" in Cloud Computing in the Ocean and Atmospheric Sciences, Elsevier, 2016.
25.
Open Archive Information System (OAIS) Reference Model, [online] Available: http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Attachments/650x0p11.pdf.
26.
M. M. Rienecker, M. J. Suarez, R. Gelaro, R. Todling, J. Bacmeister, E. Liu, et al., "MERRA: NASA's modern-era retrospective analysis for research and applications", J. Climate, vol. 24, no. 14, pp. 3624-3648, 2011.
27.
NASA Climate Model Data Services, [online] Available: https://cds.nccs.nasa.gov/wp-content/test/.
28.
J. L. Schnase, M. L. Carroll, K. T. Weber, M. E. Brown, R. L. Gill, M. Wooten, et al., "RECOVER: An automated cloud-based decision support system for post-fire rehabilitation planning", pp. 17-20, 2014.
29.
[online] Available: http://www.iplant-collaborative.org.
30.
D. Duffy, private communication, Dec. 2015.
Contact IEEE to Subscribe

References

References is not available for this document.