Introduction
Sentinel-1 is an imaging radar mission providing continuous all-weather, day-and-night imagery of relatively low-resolution C-band data [1]–[4]. Potentially imaging all global landmasses, Sentinel-1 allows for comprehensive urban target interpretation [5], [6]. In particular, the interferometric wide (IW) swath mode is the primary operational mode over land. The data are publicly accessible and provide sufficient resources for land cover applications, such as urban deformation mapping [5], [7], or forest and agriculture monitoring [8]–[10].
Considering the explosion of Sentinel-1 satellite data, the lack of urban data interpretation tools [11]–[17] as well as the rapid development of new deep learning techniques [18]–[24], the user community urgently needs a large-scale Sentinel-1 image dataset to develop more sophisticated and robust algorithms for the interpretation of urban synthetic aperture radar (SAR) images. The challenges are how the ever-increasing data can be indexed, organized into a dataset, and utilized for specific applications. These issues cause a crucial problem yet to be solved.
There are already large-scale datasets, having been compiled in the optical remote sensing field to satisfy different requirements. The existing literatures include the UC Merced land use dataset (UC-Merced for short) [25], the local climate zone dataset [26], the aerial image dataset (AID) [27], AID++ [28], the dataset for object detection in aerial images [29], and the EuroSAT dataset [30]. Because of the clear visual appearance of optical images, any dataset compilation is relatively easy to perform. On the contrary, in the SAR community, a dataset compilation faces more severe challenges. On the one hand, the nonintuitive visual image appearance—caused by the active imaging of SAR—poses the biggest obstacle in SAR image annotation. On the other hand, the SAR data themselves are rather expensive to acquire, which also is an important factor impeding any SAR dataset compilation. Despite these difficulties, researchers have developed several datasets in this field. For instance, the Western North America Interferometric SAR Consortium (https://winsar.unavco.org/) acquires SAR imagery aiming to promote the development and the use of InSAR technology. Furthermore, the moving and stationary target recognition dataset [31], covering different aspect angles, depression angles, and target configurations, is composed of ten types of military vehicle targets. The dataset is extensively adopted to develop automatic target recognition algorithms for SAR images [32]–[34]. In addition, Dumitru and Datcu [35] designed a large-scale TerraSAR-X dataset based on very high resolution (HR) imagery, aiming to promote information mining from HR and X-band SAR images. In contrast, Dumitru et al. [36] developed an HR and X-band land cover dataset for classification benchmarking of temporal changes. Later, the OpenSARShip [37] image collection, containing 11 346 SAR ship chips, was designed to promote Sentinel-1 ship interpretation. More recently, the SEN1-2 dataset [38] is to foster deep learning research in SAR-optical data fusion. However, none of these SAR datasets can focus on the interpretation of urban Sentinel-1 images with their relatively low resolution. This dataset benefits researchers also due to the advantages of large coverage, fewer layovers, and easy augmentation due to their freely accessible priority.
With the goal of filling this gap and to advance interpretation research with urban SAR images, in this study, we present a benchmarking SAR dataset called OpenSARUrban, which has been collected from 19 Sentinel-1 images, mainly covering areas of 21 individual metropolises of China. In the very beginning, a coarse-to-fine annotation scheme was proposed, which was initially implemented according to the urban operation functionalities and, then, hierarchically divided into more detailed categories (see Fig. 2). The dataset, comprising 33 358 SAR image patches (i.e., image chips) with a size of
The three main contributions of this article can be summarized as follows. First, a hierarchical coarse-to-fine annotation scheme for urban target interpretation is proposed, which takes the urban requirements into account. Second, via organizing and exploiting a rapidly growing set of Sentinel-1 SAR images, we compiled the OpenSARUrban dataset, which is particularly applicable to urban target interpretation. Third, five essential properties were achieved and some benchmarking experimental analysis was made, which contributes to the practicality and the quality of this dataset.
The remainder of this article is organized as follows. Section II presents detailed procedures for compiling the OpenSARUrban dataset. The layout and properties of OpenSARUrban are illustrated in Sections III and IV, respectively. Section V visualizes the manifolds within this dataset. Section VI provides some preliminary applications on urban target classification of this dataset as benchmarking algorithms. Finally, conclusions are drawn and future work is illustrated in Section VII.
Conception and Compilation of the OpenSARUrban Dataset
In this section, we present procedures for conceiving the OpenSARUrban dataset; we also explain how these procedures guarantee the properties of large-scale image data, their diversity, specificity, reliability, and sustainability. The dataset compilation can be explained from three aspects: data collection and preprocessing, a well-defined annotation scheme, and the step-by-step compilation procedures.
A. Data Collection and Preprocessing
Before compiling the dataset, it is necessary to collect some typical original images from the Sentinel-1 data access hub, and the corresponding preprocessing has to be done. In this study, we focus on urban targets from major cities distributed across China.
During a data collection phase, a large amount of initial Sentinel-1 SAR images were selected and downloaded from an SAR image archive, containing typical regions of interest (RoIs). In our case and in this article, the selected Sentinel-1 images cover areas of 21 major Chinese cities from 17 administrative provinces. Most of them are located around provincial cities. Table I shows details of the dataset source particularities. The geographical distribution of this dataset is shown in Fig. 1. Notably, the red circles in this figure denote the different cities. The green area, the blue colors, and the gray-colored region are land areas, rivers, and ocean areas, respectively.
In this study, we focus on Level-1 ground range detected (GRD) data with IW swath products, typically regarded as the default acquisition mode over land. These original images were downloaded from the official Sentinels Scientific Data Hub (https://scihub.copernicus.eu/dhus/
The original images and their radiometrically calibrated versions are included in this dataset. It is notable that the pixel values of radiometrically calibrated data can be directly related to the radar backscattering of the Earth's surface. In contrast to the qualitative usage of the original SAR data, their calibrated version is essential for quantitative applications. We used the SNAP 3.0 software to perform radiometric SAR image calibration. For both uncalibrated data and their corresponding calibrated version, we used GeoTIFF format images.
Annotating SAR images directly by expert inspection is a very laborious and time-consuming task. Apart from this, the relatively low image resolution causes great difficulty in target type determination. In order to overcome these challenges, the optical images from Google Earth Engine provide an optimal solution, which fills to a great extent the gap between the human visual system (HVS) [43] and the radar's active imaging mode. First, the optical images can be easily recognized by human observation, which directly leads to qualified annotations. Second, geographical annotations can be generated by 91 Weitu [44], which can be imported into SNAP 3.0 and, then, matched geographically with the corresponding SAR images. Third, several petabytes of optical remote sensing images are also provided by 91 Weitu, which is achieved by plugging in the Google Earth Engine.
B. Discernible Categories
The annotation scheme proposed in this article is actually a coarse-to-fine hierarchical scheme, confirmed by two factors. The first one is related to the different visual patterns in Sentinel-1 SAR images, which are confirmed by several SAR experts. Even though there are subtle differences, most categories from different images are distinguishable. The other factor is the different use of urban areas, i.e., business areas, residential areas, industrial areas, and others. Fig. 2 shows our annotation scheme. The first level is differentiated by the overall functionality. In what follows, the second level gives more detailed semantic subcategories, which have to be annotated individually. The annotation scheme is defined based on the data collection from major Chinese cities. However, this will not limit the generation of the dataset. First, our dataset is mainly composed of urban buildings, which vary a lot between China and other countries, like France we have already investigated, but where we found that it does not work. Considering this, we cannot give a uniform annotation scheme from a worldwide perspective. Second, the buildings in Chinese cities vary a little, but not too much. So the annotation scheme is relatively easy to define. Third, the dataset is large enough to support our research.
For instance, skyscrapers are a representative building type in business areas, which often consist of extremely bright pixels in SAR images. In contrast, residential areas can be described by four subcategories, i.e., general residential areas, high-rise buildings, dense and low-rise residential areas, and villas. General residential areas in today's China contain as the most common buildings those with not more than six floors. On the contrary, high-rise areas consist of residences with tens of floors and the distance between the buildings is usually very large. Dense and low-rise areas are commonly very crowded and their buildings are very low. Generally, villas are located in suburbs and they are often surrounded by trees, vegetation, and lakes, while a storage area is a typical category in industrial areas, usually located at the city periphery. Playing a paramount role in any municipal transportation system are its hubs, including airports, railways, and highways that are essential for the city operation. It is worthwhile to point out that highways are also an important indicator of overpasses in urban areas. In addition, another indispensable category needed in urban areas is vegetation. Fig. 3 illustrates typical examples of optical samples and their corresponding SAR samples, covering all our ten categories. The colored masks on each example are actually the main surface cover of the current category. It can be seen that the labeled areas have different shapes between the optical images and their corresponding SAR images. The reasons can be attributed to three points. First, the imaging mechanisms of SAR and optical instruments are different. Specifically, SAR is an active imaging technique, which transmits and receives the reflected signals, whereas optical sensors work in a passive mode, which receives the reflected signals of the natural illumination. Therefore, the geometrical distortions in SAR images are very common, and are not in accordance with our human visual experience. In contrast, optical images are more easily recognized by human observers. Second, an SAR sensor may observe the Earth's surface from different directions, thus accentuating the phenomenon. Third, the depression angle can also call for a rotation or a flipping of the images when comparing them with optical images. Even though there are significant differences between optical images and their corresponding SAR images, the geographical information plays an important role during image coregistration. These differences can make the interpretation of SAR images rather difficult.
Optical examples and their corresponding SAR examples for each category. The first row and the second row show optical and SAR examples of skyscraper, dense and low-rise residential buildings, high-rise buildings, villas, and general residential areas, respectively. The third row and the fourth row show optical and SAR examples of storage areas, airports, railways, highways, and vegetation, respectively. The colored masks are the main land cover locations of the given category.
C. Dataset Compilation
Fig. 4 displays the overall workflow of the OpenSARUrban dataset compilation. The annotation of this dataset was implemented by a transition from an optical space to SAR space. In particular, targets were first annotated in our optical images with the assistance of the 91 Weitu software package [44]. The optical annotations were, then, saved in a “.shp” file together with their geographical information, which serves as a bridge linking the optical images with their corresponding (i.e., overlapping) SAR image counterparts. The time gap between optical images and SAR images was eliminated by aligning the image acquisition times recorded in the metadata files. The target coordinates within the SAR images can be obtained by using the Sentinel-1 application platform (SNAP 3.0) software [45]. Finally, with the availability of the generated SAR annotations, the original SAR images were tiled into image patches separately for each land cover category. It is worth noting that the validity of the OpenSARUrban data is accurately monitored in every step and is provided in several data formats to satisfy different user requirements. The core procedures can be explained by nine steps as follows.
Step 1: Information Querying
In the metadata file of Sentinel-1 packages, the geographical location, i.e., the longitude and latitude, of the image's four corners can be easily queried. In addition, the image acquisition date/time can also be obtained from this file, formatted as “Day-Month-Year” plus the coordinated universal time (UTC) of the acquisition.
Step 2: Geo-Referencing of the Corresponding Optical Remote Sensing Images
With the information about location and image acquisition time being available, one can search the RoIs in optical remote sensing images, which are mostly in line with the HVS, making any semantic annotation much easier. Specifically, the image location is defined by the boundaries of the current imaging area and the exact image acquisition time can be traced. Here, the 91 Weitu software provides an optimal way to make this step.
Step 3: Creating a New Directory
Before annotating a specific category, we routinely create a new directory, naming it with the category name, such as “high-rise building.” In the following steps, the human-readable annotations belonging to this category can be automatically saved in this directory on condition that the directory has been successfully activated.
Step 4: Annotating Optical Remote Sensing Images
With a relatively low resolution of about 20 m for Sentinel-1 images, directly annotating SAR images is very challenging for image analysts. In order to overcome this problem, the optical images, being directly understandable by human observation, provide an optimal way to accurately annotate a given image. The optical images from Google Earth Engine have a resolution of 0.13 m both along azimuth and range directions. The annotations are marked with a series of polygons using the “polygon annotation” tool provided by the 91 Weitu software. The geographical locations of the annotated polygon vertices are also stored in the annotation file. Thus, with the geographical information as a bridge, annotations in optical images can be easily matched with coaligned SAR images.
Step 5: Saving the Optical Annotations
This step is to save the optical annotations in a “.shp” file, which contains the geographical information of each annotation. The “.shp” annotation file is to be saved in the directory created in Step 3.
Step 6: Checking the Optical Annotations
When all the categories in the optical images have been annotated and the corresponding “.shp” files have been created, these annotations should be checked in the optical domain with the assistance of the 91 Weitu software. During this process, if the targets are correctly annotated, these files are saved into a directory named “optical annotation.” Otherwise, the optical annotations should be corrected before saving.
Step 7: Image Coregistration
With the geographical information being available, the optical annotations can be easily matched with the corresponding SAR images with the help of SNAP 3.0 desktop. In particular, the Sentinel-1 images should be opened first by the SNAP 3.0 software. In this study, the selected image is usually a VH polarized image, because of its better visual quality when comparing it with VV polarization. The better visual appearance of VH polarization can be demonstrated in this step. Then, the optical annotations, formatted with “.shp” files, are imported into SNAP. The annotations will be checked and matched with the corresponding SAR image by using this software, where the Sentinel-1 image and the optical annotations are coaligned automatically.
Step 8: Exporting SAR Annotations
With the help of the SNAP 3.0 toolbox, the annotation information of SAR images is stored in a “.txt” file in the “SAR annotation” directory. Actually, the SAR annotation contains the coordinates of the categorical locations in the Sentinel-1 images. More specifically, the coordinates are the locations of the annotated polygon vertices.
Step 9: Generating SAR Image Patches With Different Formats
In this step, SAR image patches of each category are generated with MATLAB. Using the polygon annotations, we tile them into image patches with a size of
Our overlap computation mechanism. The purple square and the yellow polygon represent a tiled image patch and an annotated SAR polygon, respectively. If their overlap is larger than a preset margin, the corresponding image patch is saved into the dataset.
Layout of the OpenSARUrban Dataset
The OpenSARUrban dataset is organized in different folders for different image categories and formats. For a given Sentinel-1 SAR image, the content of the generated dataset is illustrated in Fig. 6. When it comes to the formats of image patches, each of them consists of four different formats, i.e., the original data, the visualized data in gray-scale representation, the visualized data in pseudo-color, and the radiometrically calibrated data. Image patches with different formats are provided in different subfolders. Different formats shall achieve the goal to satisfy different user requirements. At present, ten different target categories are stored separately. Each image patch is named by the combination of its category name, pixel coordinates, polygon index, and the polarization mode, thus supporting the patch retrieval.
The original data are stored in 32-bit format, which is in line with the original data format of Sentinel-1. Considering the GRD format of a Sentinel-1 image, each image patch is stored in a two-channel matrix, where each one contains the amplitude values of the pixels with VH and VV polarizations, respectively. Based on the original image patches, image enhancement is applied to visualize the data in gray-scale as unsigned 8-bit integers (UINT8) for VH and VV polarizations. The radiometrically calibrated data, obtained by using the SNAP 3.0 software, are stored in a matrix, which contains the normalized radar cross section
Image patches for each category with UINT8 format. The first row shows VH-polarized examples of skyscrapers, general residential areas, high-rise building blocks, dense and low-rise residential areas, and villas; the second row shows the corresponding VV-polarized examples; the third row displays VH-polarized data of airports, railways, highways, industrial storage areas, and vegetated areas; the fourth row exhibits the corresponding VV-polarized patches.
The detailed information about each image patch, its SAR signatures, and the generated messages provided by their metadata are listed in an XML-file named “Annotation.xml.” The annotation information, including the annotation times, the locations, etc., is also provided in this file. Via the pixel coordinates contained in the name of each image patch, the corresponding information can be easily retrieved in the XML-file.
Properties of OpenSARUrban
A. Large Scale
The original data cover 21 Chinese major cities, related to 17 political provinces. Consequently, the OpenSARUrban dataset provides as much as 33 358 image patches, where each image patch includes 4 different kinds of image formats and 2 different kinds of polarization modes.
B. Diversity
1) Data Format Diversity
For each image patch, there are four different formats available, i.e., the original 32-bit image, the enhanced gray-scale image, the radiometrically calibrated image, and the pseudo-colored radiometrically calibrated image. These four different formats are generated to satisfy different requirements.
2) Geographical Diversity
The original extent of OpenSARUrban comprises data from 21 Chinese major cities in 17 administrative provinces. The image patches from each city are separately stored in the corresponding folders. Fig. 8 explains the image patch distributions across different cities. In this figure, bars with different colors represent different categories and image patches are grouped according to the cities they come from. In this figure, Guangzhou actually represents image patches jointly collected from the city of Guangzhou, Shenzhen, and Hong Kong, because of the wide-swath capability of Sentinel-1.
Dataset distributions among different cities. Differently colored bars represent the number of image patches from different categories. Image patches are grouped according to the city distribution.
3) Categorical Diversity
Related categories in this study and the validity of the annotation scheme are illustrated in Section II-B. All these typical categories comprise the OpenSARUrban dataset's categorical diversity. The dataset distributions among different categories are illustrated in Fig. 9. The categories of transportation hubs are very limited in this dataset, calling for more research on data imbalance.
4) Polarization Diversity
For each image patch in this dataset, VH- and VV-polarized data are included. Different polarizations, conveying different scattering signatures and visual effects, have different potentials in describing different kinds of urban types.
C. Specificity
This study provides a large-scale C-band urban categorical dataset with a resolution of 20 m. This dataset essentially aims to provide a theorem study on SAR signatures and the characteristics of different urban categories, as well as paving the way for applications of these kinds of data. Specifically, the OpenSARUrban dataset is designed to: 1) study characteristics and potentials for different urban areas by using Sentinel-1 SAR images; 2) develop sophisticated urban target interpretation algorithms for these kinds of data; and 3) support content-based image retrieval.
To the best of our knowledge, representative C-band urban target data for SAR images with a resolution of 20 m are not yet available for users to study their characteristics. Therefore, we created the OpenSARUrban dataset, expecting to fill this gap.
D. Reliability
Initially, the OpenSARUrban dataset was annotated by analyzing optical remote sensing images, which can provide annotators with visually clearly understandable information; furthermore, the optical images can give annotators some additional understandings about the Earth's surface. The image coregistration between optical images and their corresponding SAR images is supported by geographical information, which is very helpful when verifying the annotation coordinates, and the image acquisition dates, which eliminates the time gap between optical images and SAR images effectively. Moreover, some confused classifications can be corrected by the additional information. Thus, the annotation quality of this dataset can be improved.
E. Sustainability
The optical annotations recorded as “.shp” files can be used for further coregistration with newly available SAR data. The geographical information contained in the optical annotation files can be regarded as an effective bridge linking new SAR images located in the same area and acquired at the same date. Thus, the manual labeling effort can be greatly reduced, more Sentinel-1 SAR datasets can be easily generated, and more urban categories can be added to the dataset. In these cases, the process makes the OpenSARUrban dataset to be enriched in a relatively simple way.
Visualization of the OpenSARUrban Dataset
To illustrate the data manifolds within OpenSARUrban, the algorithms of FCD [47], showing its advantages when applying a remote sensing dataset, and t-SNE [48] are combined to visualize this dataset. The idea behind this is to convert similarities between data points to joint probabilities and attempting to minimize the Kullback–Leibler divergence [49] between the joint probabilities of the low-dimensional manifolds and the high-dimensional data. This method is a parameter-free and, thus, unbiased data analysis technique, performing well on preserving the complete local structure and some global structures of the data points. The visualization and interpretation are based on a Vega style interactive tool. Users are able to zoom in, zoom out, and pick out datasets from OpenSARUrban. The visualization procedures are depicted in Fig. 10, including raw data extraction, dictionary extraction, pair-wise distance computation, dimension reduction, and visualization. Further details can be found in [50].
Considering the computational load of this method, in an experiment, we picked out 300 samples randomly from each category, including the 2 polarization modes. The visualization results are analyzed in the manifold space, which provides an intuitive way to understand the dataset. Fig. 11(a) and (b) exhibits the category visualization results for VH and VV polarization, respectively. Different colors in these figures represent different urban categories. From the visualization results, one can observe the following.
Urban target visualization of the OpenSARUrban dataset. (a) VH-polarized dataset visualization. (b) VV-polarized dataset visualization.
The VH-polarized images are generally more clearly distinguishable than the VV-polarized data.
The transportation hubs, including airports, railways, and highways, have their distinct manifold spaces, both for VH and for VV polarization. This is due to their specific image patterns and the large functional distances between them.
The business areas, represented by skyscrapers in this dataset, dominate their own manifold space for both VH and VV polarization modes. The performance seems better for VH polarization than for VV polarization.
The functional areas of residential regions, including general residential areas, high-rise building areas, dense and low-rise areas, and villas, are mostly assembled in one cluster, following the manifolds of the VH-polarized dataset. As for the VV-polarized results, they are sometimes mixed up with other urban categories.
In this dataset, comprising the manifold spaces of the VH- and VV-polarized datasets, the vegetation areas can be well clustered. However, this cluster is close to industrial areas, represented by the storage category, from the perspective of manifold visualization.
Both figures pose great challenges in distinguishing the ten categories.
Evaluation of Our Urban Categorization and Discussions of OpenSARUrban
It is generally acknowledged that urban categorization for SAR images is very challenging. The difficulty is further increased when encountering such a relatively low resolution. As explained before, these challenges always exist and are rather severe for OpenSARUrban interpretation. We assume that the OpenSARUrban dataset classification is demonstrably reliable as shown by some prevailing deep learning methods and some traditional SAR image classification techniques, e.g., a combination of representative traditional feature descriptors of SAR images and a linear support vector machine (SVM) classifier [51].
A. Benchmarking Algorithms for Urban Categorization
In order to demonstrate the distinguishability of target area categories and to provide some representative benchmarking algorithms for this dataset, we carried out urban target categorization on our OpenSARUrban dataset. In order to achieve this task, we comprehensively analyzed the prevailing deep learning algorithms developed in recent years and several representative hand-crafted feature descriptors for SAR images.
The reason for using deep learning algorithms in this study were the astonishing achievements they have achieved. In particular, densely connected convolutional networks (DenseNet) [52], the deep residual network with 50 residual blocks (ResNet50) [53], SqueezeNet [54], very deep convolutional networks with 19 layers (VGG19) [55], and AlexNet [56] were evaluated for the classification of this dataset.
For the evaluation of traditional methods, six prevailing SAR image descriptors, including local binary patterns (LBPs) [57], LogGabor features [58], Gabor features [59], Weber local descriptors [60], histograms of oriented gradients [61], and principle component analysis (PCA) [62], were selected to evaluate the usefulness of OpenSARUrban. The number of scales and orientations for Gabor features and LogGabor features were set to 6 and 4, respectively. Then, a PCA feature descriptor reduces each image patch to a 30-dimensional vector. For simplicity and giving a fair comparison, a linear SVM was chosen as a classifier in each experiment.
B. Implementation Details and Evaluation Metrics
During the process of implementing urban target classification with this dataset, it was split into a training part and a testing part. Among them,
The network training for deep learning techniques was started with an initial learning rate of 0.001 and the learning rate changes were set in accordance with the “poly” descending policy described in [64]. The parameters were iteratively updated until convergence was reached.
Without loss of generality, the overall accuracy (OA) [65], [66] and a confusion matrix [27], [28] were applied to evaluate the performance of the available benchmarking algorithms.
C. Overall Performance of OpenSARUrban for Urban Target Categorization
The OA of the OpenSARUrban dataset by each benchmarking algorithm is shown in Fig. 12. In this figure, the purple bar and the yellow bar indicate the performances of the VH and VV polarization modes of this dataset, respectively. One can observe the following from this figure.
Generally speaking, deep learning methods surpass traditional methods by a large margin, except for LBP.
The best performance for both VH polarization and VV polarization, in terms of OA, can be given by VGG19, being
and$\text{89.49}\%$ , respectively.$\text{89.53}\%$ Among the traditional methods, LBP, with
and$\text{71.06}\%$ of OA for the VH and VV polarization modes, is far more advantageous than the other methods and even surpasses some deep learning techniques.$\text{70.82}\%$
In real situations, the categorization depends on the quality of the samples, the feature selections, etc. However, in general, the VH-polarized man-made structures perform better than VV-polarized ones. We would like to explain this phenomenon from three different perspectives. First, this can be understood from the perspective of radar backscattering theory. The categories related to this study are mostly man-made structures, except for the category of vegetation. It has been demonstrated that VH-polarized data contributes the most to the classification of man-made structures [67]. The second viewpoint is to explain the phenomenon from the perspective of visual appearance. A comparison of visual appearance between VH-polarized images and VV-polarized images is shown in Fig. 7. By human observations, it is visually clear that the VH-polarized images show more intuitive details than the VV-polarized ones. Third, we attempt to give some explanations from the perspective of manifold visualization. The even more separable visual intuitive details of VH-polarized images can also be demonstrated by the manifold visualization results, shown in Fig. 11(a) and (b). The easier separation between different categories of VH polarization shows us that the VH-polarized images are more distinctive than the corresponding VV-polarized ones.
In order to give a comprehensive explanation of this dataset, we provide the confusion matrices of the best investigated deep learning method (i.e., VGG19) and the best investigated traditional method (i.e., LBP) for the overall classification of this dataset. The confusion matrices produced by VGG19, including both VH and VV polarizations, are depicted in Fig. 13(a) and (b), respectively. Observers can easily conclude from these two confusion matrices that the urban targets in this dataset can be well distinguished by using VGG19, even though their visualization contains some confusions [for comparison, see Fig. 11(a) and (b)]. However, even with the VGG19 algorithm, there are still some categories, e.g., general residential areas (Gen.Res. for short), storage areas, and dense and low-rise residential areas (simplified as denselow) that cannot achieve a highly satisfactory classification accuracy, indicating potentials for developing more advanced classification algorithms on this kind of data.
Confusion matrix of VGG19 when classifying the whole dataset containing different polarizations. “Gen.Res.” denotes general residential areas. (a) VGG19 for VH. (b) VGG19 for VV.
Fig. 14(a) and (b) represents the confusion matrices obtained by the combination of an LBP feature descriptor and a linear SVM classifier. These figures are based on the following points: 1) most categories can be well categorized except for three typical building types: general residential areas, industrial storage areas, and high-rise buildings, which, comparatively, are better distinguishable by using the VGG19 method; 2) these three most challenging building types are very prone to be confused with each other; and 3) an LBP descriptor performs almost the same for both VH and VV polarizations. The inadequacies of LBPs, for one thing, demonstrate the limitations of hand-crafted feature descriptors. For another thing, the transportation hubs, i.e., airports, railways, and highways, are easy to be distinguished because of the great differences in image patterns with urban buildings.
Confusion matrix of LBP when classifying the whole dataset containing different polarizations. “Gen.Res.” denotes the general residential areas. (a) LBP for VH. (b) LBP for VV.
To further explore the effectiveness of LBP features on the three dominating challenging building types, i.e., Gen.Res., storage areas, and high-rise buildings, we picked them out from the whole dataset and carried out a classification run purely on these three categories. The classification accuracy for each category is shown with their confusion matrices depicted in Fig. 15(a) and (b). These two figures are for VH polarization and VV polarization, respectively. It is concluded that an LBP feature descriptor has very limited capability for recognizing general residential areas, industrial storage areas, and high-rise buildings, which greatly reduces the overall performance of LBPs in differentiating this dataset.
Confusion matrix of the three typical building types by using LBP features. Both VH and VV polarization modes are evaluated. (a) LBP for VH. (b) LBP for VV.
D. Analysis of Specific Urban Functionalities
1) Residential Area Evaluation
The classification accuracy of residential areas, including general residential areas, high-rise building areas, dense and low-rise residential areas, and villas, is shown for each algorithm in Fig. 16. Among which, Fig. 16(a), (b), (c), and (d) denotes the accuracy of general residential areas, high-rise building areas, dense and low-rise residential areas, and villas, respectively. The purple bar and the yellow bar in these figures illustrate the results of VH and VV polarization, respectively. Based on these figures, some conclusions can be drawn as follows.
Classification accuracy of residential areas, including general residential areas, high-rise building areas, dense and low-rise residential areas, and villas, by using 11 benchmarking algorithms. Both the VH and VV polarizations are compared and evaluated. (a) General residential area. (b) High building area. (c) Dense and low residential area. (d) Villas.
Traditional methods are seriously limited for recognizing general residential areas and high-rise buildings.
AlexNet is solely effective for high-rise building areas, while it has very limited capabilities for recognizing other residential areas.
LBPs show that they are advantageous for dense and low-rise areas and villa areas.
For the recognition of villa areas, DenseNet, VGG19, and LBP stand at a comparably favorable position, showing almost the same level of classification accuracy for both VH and VV polarizations.
2) Transportation Hub Evaluation
Fig. 17 exhibits the classification accuracy of transportation hubs by applying the prevailing five deep learning methods and six traditional methods. Both VH and VV polarizations are compared and evaluated. In particular, the classification accuracies of airports, railways, and highways are shown in Fig. 17(a), (b), and (c), respectively. The results demonstrate that: 1) the deep learning algorithms, LBPs, and PCA features have relatively satisfactory capabilities for interpreting both VH and VV polarized transportation hubs; and 2) the PCA feature representation is simple yet powerful for transportation hub description, achieving results as good as deep learning methods.
Classification accuracy of transportation hubs, including airports, railways, and highways, by using 11 benchmarking algorithms. Both VH and VV polarizations are compared and evaluated. (a) Airport. (b) Railway. (c) Highway.
3) Business Area Evaluation
The classification accuracy of skyscrapers, including VH and VV polarizations, is shown in Fig. 18. Notably, skyscrapers are the most representative building types in business areas, which are chosen for annotation in the OpenSARUrban dataset. We can observe that: 1) the algorithms of VGG19 and LBP show great superiority over others, and for both of them, the VH polarization performs slightly better than VV; 2) and the performance differences between VH and VV for DenseNet and ResNet50 are very striking, precisely being
Classification accuracy of business areas, represented by skyscrapers. Both VH and VV polarizations are compared and evaluated.
4) Industrial Area Evaluation
Fig. 19 depicts the classification accuracy of each benchmarking algorithm for industrial storage areas, where both VH and VV polarizations are included. For the identification of this category, the results listed in this figure tell us that: 1) the deep learning algorithms show their advantages over traditional methods, both for VH and VV polarizations; 2) DenseNet, ResNet50, and VGG19 can achieve almost the same classification accuracy for VH polarization mode, being roughly
Classification accuracy of industrial areas, represented by storage areas. Both VH and VV polarizations are compared and evaluated.
5) Urban Vegetation Evaluation
The classification accuracy for identifying urban vegetation areas, which account for a large amount of land covers in urban areas, by utilizing each benchmarking algorithm is illustrated in Fig. 20. This figure can be summarized based on the following points: 1) a satisfying performance can be achieved by using SqueezeNet, VGG19, AlexNet, and LBP for both VH and VV polarizations; and 2) VGG19, which obtains the most convincing results for both polarizations of this category, is more suitable for VV polarization. The accuracy gap with results from VH-polarized data amounts to roughly
Classification accuracy of urban vegetation. Both VH and VV polarizations are compared and evaluated.
Conclusion and Future Work
This article describes a Sentinel-1 dataset for urban interpretation, called OpenSARUrban. The dataset is comprised of 10 different urban categories, including 4 kinds of formats and 2 kinds of polarization modes for each image patch, and covering 21 major cities of China. Specifically, the image formats of the original data, the visualized gray-scale data, the visualized data in pseudo-color, and the calibrated data are included. The polarization modes include VH and VV polarizations. With the five essential properties of large scale, diversity, specificity, reliability, and sustainability, the goals of this dataset can be achieved. The dataset structure is visualized from the perspective of data manifolds by using the FCD and the t-SNE. Finally, some benchmarking algorithms and experimental results are presented to demonstrate the practicality and the quality of this dataset. It is believed that developing methods enhancing the performance for the whole dataset is very challenging and the dataset is also expected to foster research on data imbalance. In the era of big data for the SAR community, OpenSARUrban is expecting to provide a dataset for developing much more advanced algorithms for Sentinel-1 urban interpretation and to foster the characterization of these kinds of data. In future, this work will be extended throughout the world and also the time-series data.
The OpenSARUrban dataset can be found at https://pan.baidu.com/s/1D2TzmUWePYHWtNhuHL7KdQ.
ACKNOWLEDGMENT
The authors would like to thank the European Space Agency for providing the Sentinel-1 data and the SNAP 3.0 software; they would like to thank Qianfanshijing Technology (Beijing) Co., Ltd. for providing the 91 Weitu software. The authors sincerely appreciate several colleagues in the Shanghai Key Laboratory of Intelligent Sensing and Recognition for their devoted assistance on this dataset annotation. The authors would also like to thank G. Schwarz for many helpful hints and the reviewers for their valuable and insightful comments.