Journals & Magazines >IEEE Journal of Selected Topi... >Volume: 13

OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation

Abstract:

The Sentinel-1 mission provides a freely accessible opportunity for urban image interpretation based on synthetic aperture radar (SAR) data with a specific resolution, wh...Show More

Metadata

Abstract:

The Sentinel-1 mission provides a freely accessible opportunity for urban image interpretation based on synthetic aperture radar (SAR) data with a specific resolution, which is of paramount importance for Earth observation. In parallel, with the rapid development of advanced technologies, especially deep learning, we urgently need a large-scale SAR dataset supporting urban image interpretation. This article presents OpenSARUrban: a Sentinel-1 dataset dedicated to the content-related interpretation of urban SAR images, including a well-defined hierarchical annotation scheme, data collection, well-established procedures for dataset compilation and organization as well as properties, visualizations, and applications of this dataset. Particularly, our OpenSARUrban collection provides 33 358 image patches of urban SAR scenes, covering 21 major cities of China, including 10 different target area categories, 4 kinds of data formats, 2 kinds of polarization modes, and owning 5 essential properties: large-scale coverage, diversity, specificity, reliability, and sustainability. These properties guarantee the achievement of several goals for OpenSARUrban. The first one is to support urban target characterization. The second one is to help develop well-applicable and advanced algorithms for Sentinel-1 urban target classification. The third one is to explore content-based image retrieval for these kinds of data. In addition, dataset visualization is implemented from the perspective of manifolds to give an intuitive understanding. Besides a detailed description and visualization of the dataset, we present results of some benchmarking algorithms, demonstrating that this dataset is practical and challenging. Notably, developing algorithms to enhance the classification performance on the whole dataset and considering the data imbalance are especially demanding.

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 13)

Page(s): 187 - 203

Date of Publication: 08 January 2020

ISSN Information:

DOI: 10.1109/JSTARS.2019.2954850

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Sentinel-1 is an imaging radar mission providing continuous all-weather, day-and-night imagery of relatively low-resolution C-band data [1]–[4]. Potentially imaging all global landmasses, Sentinel-1 allows for comprehensive urban target interpretation [5], [6]. In particular, the interferometric wide (IW) swath mode is the primary operational mode over land. The data are publicly accessible and provide sufficient resources for land cover applications, such as urban deformation mapping [5], [7], or forest and agriculture monitoring [8]–[10].

Considering the explosion of Sentinel-1 satellite data, the lack of urban data interpretation tools [11]–[17] as well as the rapid development of new deep learning techniques [18]–[24], the user community urgently needs a large-scale Sentinel-1 image dataset to develop more sophisticated and robust algorithms for the interpretation of urban synthetic aperture radar (SAR) images. The challenges are how the ever-increasing data can be indexed, organized into a dataset, and utilized for specific applications. These issues cause a crucial problem yet to be solved.

There are already large-scale datasets, having been compiled in the optical remote sensing field to satisfy different requirements. The existing literatures include the UC Merced land use dataset (UC-Merced for short) [25], the local climate zone dataset [26], the aerial image dataset (AID) [27], AID++ [28], the dataset for object detection in aerial images [29], and the EuroSAT dataset [30]. Because of the clear visual appearance of optical images, any dataset compilation is relatively easy to perform. On the contrary, in the SAR community, a dataset compilation faces more severe challenges. On the one hand, the nonintuitive visual image appearance—caused by the active imaging of SAR—poses the biggest obstacle in SAR image annotation. On the other hand, the SAR data themselves are rather expensive to acquire, which also is an important factor impeding any SAR dataset compilation. Despite these difficulties, researchers have developed several datasets in this field. For instance, the Western North America Interferometric SAR Consortium (https://winsar.unavco.org/) acquires SAR imagery aiming to promote the development and the use of InSAR technology. Furthermore, the moving and stationary target recognition dataset [31], covering different aspect angles, depression angles, and target configurations, is composed of ten types of military vehicle targets. The dataset is extensively adopted to develop automatic target recognition algorithms for SAR images [32]–[34]. In addition, Dumitru and Datcu [35] designed a large-scale TerraSAR-X dataset based on very high resolution (HR) imagery, aiming to promote information mining from HR and X-band SAR images. In contrast, Dumitru et al. [36] developed an HR and X-band land cover dataset for classification benchmarking of temporal changes. Later, the OpenSARShip [37] image collection, containing 11 346 SAR ship chips, was designed to promote Sentinel-1 ship interpretation. More recently, the SEN1-2 dataset [38] is to foster deep learning research in SAR-optical data fusion. However, none of these SAR datasets can focus on the interpretation of urban Sentinel-1 images with their relatively low resolution. This dataset benefits researchers also due to the advantages of large coverage, fewer layovers, and easy augmentation due to their freely accessible priority.

With the goal of filling this gap and to advance interpretation research with urban SAR images, in this study, we present a benchmarking SAR dataset called OpenSARUrban, which has been collected from 19 Sentinel-1 images, mainly covering areas of 21 individual metropolises of China. In the very beginning, a coarse-to-fine annotation scheme was proposed, which was initially implemented according to the urban operation functionalities and, then, hierarchically divided into more detailed categories (see Fig. 2). The dataset, comprising 33 358 SAR image patches (i.e., image chips) with a size of $\text{100}\times \text{100}$ pixels each, supports 10 different functional urban types. The design of the OpenSARUrban dataset follows the idea of annotation transition from the optical domain to the SAR domain. Owing to five essential properties, namely large scale, diversity, specificity, reliability, and sustainability, OpenSARUrban achieves several goals. The first one is to support urban target characterization analysis. The second one is to foster applicable and advanced classification algorithms for Sentinel-1 urban targets. The third one is to explore content-based image retrieval [39]–[42] of this kind of data. The visualization of this dataset is performed from the perspective of manifolds via a combination of fast compression distances (FCDs) and t-distributed stochastic neighbor embedding (t-SNE), which offers an intuitive way to understand the structure within the given dataset. In the case of image classification of this dataset, some representative benchmarking algorithms are provided.

The three main contributions of this article can be summarized as follows. First, a hierarchical coarse-to-fine annotation scheme for urban target interpretation is proposed, which takes the urban requirements into account. Second, via organizing and exploiting a rapidly growing set of Sentinel-1 SAR images, we compiled the OpenSARUrban dataset, which is particularly applicable to urban target interpretation. Third, five essential properties were achieved and some benchmarking experimental analysis was made, which contributes to the practicality and the quality of this dataset.

The remainder of this article is organized as follows. Section II presents detailed procedures for compiling the OpenSARUrban dataset. The layout and properties of OpenSARUrban are illustrated in Sections III and IV, respectively. Section V visualizes the manifolds within this dataset. Section VI provides some preliminary applications on urban target classification of this dataset as benchmarking algorithms. Finally, conclusions are drawn and future work is illustrated in Section VII.

SECTION II.

Conception and Compilation of the OpenSARUrban Dataset

In this section, we present procedures for conceiving the OpenSARUrban dataset; we also explain how these procedures guarantee the properties of large-scale image data, their diversity, specificity, reliability, and sustainability. The dataset compilation can be explained from three aspects: data collection and preprocessing, a well-defined annotation scheme, and the step-by-step compilation procedures.

A. Data Collection and Preprocessing

Before compiling the dataset, it is necessary to collect some typical original images from the Sentinel-1 data access hub, and the corresponding preprocessing has to be done. In this study, we focus on urban targets from major cities distributed across China.

During a data collection phase, a large amount of initial Sentinel-1 SAR images were selected and downloaded from an SAR image archive, containing typical regions of interest (RoIs). In our case and in this article, the selected Sentinel-1 images cover areas of 21 major Chinese cities from 17 administrative provinces. Most of them are located around provincial cities. Table I shows details of the dataset source particularities. The geographical distribution of this dataset is shown in Fig. 1. Notably, the red circles in this figure denote the different cities. The green area, the blue colors, and the gray-colored region are land areas, rivers, and ocean areas, respectively.

Fig. 1.

Data source distributions of the OpenSARUrban dataset.

MIT Libraries

MIT Libraries

OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Conception and Compilation of the OpenSARUrban Dataset

A. Data Collection and Preprocessing

B. Discernible Categories

C. Dataset Compilation

Layout of the OpenSARUrban Dataset

Properties of OpenSARUrban

A. Large Scale

B. Diversity

1) Data Format Diversity

2) Geographical Diversity

3) Categorical Diversity

4) Polarization Diversity

C. Specificity

D. Reliability

E. Sustainability

Visualization of the OpenSARUrban Dataset

Evaluation of Our Urban Categorization and Discussions of OpenSARUrban

A. Benchmarking Algorithms for Urban Categorization

B. Implementation Details and Evaluation Metrics

C. Overall Performance of OpenSARUrban for Urban Target Categorization

D. Analysis of Specific Urban Functionalities

1) Residential Area Evaluation

2) Transportation Hub Evaluation

3) Business Area Evaluation

4) Industrial Area Evaluation

5) Urban Vegetation Evaluation

Conclusion and Future Work

ACKNOWLEDGMENT

References

IEEE Account

Purchase Details

Profile Information

Need Help?