1. Introduction
Semantic labeling for the VHR image, which is to assign each pixel to a given object class, is a long-standing research problem in image processing, as it plays a vital role in infrastructure planning, territorial planning, urban change detection, and so on. However, as Fig. 1 shows, in urban areas, many manmade object categories are composed of a large number of different materials with similar color and texture. Meanwhile, fine-structured objects in cities (such as cars, trees) are small or threadlike and interact with each other through occlusions and cast shadows. Both result in that semantic labeling for this kind of image poses additional challenge.