I. Introduction
Recent years have seen an exploding growth of the availability of geospatial data from both the public and private sectors including geoscientific model outputs [1]–[3], satellite imagery [4]–[8], aerial survey data [9], [10], measurement data collected through IoT (internet-of-things) [11], [12], etc. While such growth raises unprecedented opportunities for analytics which derives value via cross-pollinating datasets from multiple disciplines, it also raises enormous challenges for data management [13], [14]. Indeed, to-date, unlike many other areas of IT, binary files such as HDF, GRIB, or GeoTiff etc. remain the most popular way of storing geospatial data, such as satellite images and numerical weather prediction model outputs. While the file based storage approaches provide reasonable data storage efficiency and basic read and write capability, a heavy burden is put on the end-users to assemble data distributed in a large number of files of different formats and harmonize projections from different sources. Moreover, the data query and search capability at the binary file level is also not sufficient to support analytics needs. A Big Data infrastructure in which data is indexed beyond the file level is needed to provide more powerful query support and rapid discovery capabilities.