I. Introduction
The Last decade has witnessed an explosion in the number and throughput of airborne and spaceborne terrain sensors using modalities such as the synthetic aperture radar (SAR) [1]. Overwhelming quantities of high-resolution satellite imagery are now available to support accurate earth observations and topographic measurements. Even with modern computers, it is a daunting task to densely label such images with the underlying terrain-type classes. There are three main reasons for this.
Complex and ambiguous image appearance: Within a single terrain class, objects of different materials or layouts or observed from different perspectives often produce markedly different images. Sensor artifacts such as SAR “speckle” make the interpretation even more difficult, as does the fact that, locally, small regions of imagery are often highly ambiguous. For example, a homogeneous dark region in a SAR image may be calm water, a road surface, or a radar shadow, Fig. 1.
The need for high throughput: To process the huge quantities of data that are available, very efficient visual features and classifiers are needed. Stringent accuracy requirements and the incorporation of local context to mitigate aperture effects both tend to increase the computational complexity.
The scarcity of labeled training data: System performance is critically dependent on the amount and accuracy of the available training data. Producing suitable human-supplied annotations can be prohibitively expensive, dangerous, or even impossible. This is especially true when training requires detailed pixel-level labelings.
Ambiguity in SAR images. (a) Two patches of similar appearance and the corresponding intensity histograms. (b) Images containing the patches. One patch is the radar shadow of a building, the other is water.