1. Introduction
The availability of high-resolution satellite images from different modalities, such as Electro-Optical (EO) and Synthetic Aperture Radar (SAR), has made it easy to monitor urban environments on a large scale. Building segmentation is important in monitoring changes in the urban landscape, as buildings are critical components in these regions. It aims to classify the area occupied by the building in the image by pixel-level classification. Building information is used in many applications like change monitoring, map updating, disaster response [1], population density estimation [23], humanitarian aid, and 3-D modeling [4]. In these applications, high-resolution SAR imaging is highly beneficial, as it provides consistent information over EO imaging due to all-weather operational capabilities. However, SAR sensors have certain drawbacks such as speckle noise and less semantic information, which makes interpretation challenging for computer vision systems as well as human interpreters [20]. Automated building detection still faces significant challenges due to the diversity of buildings in terms of shapes and sizes, the complex background environment, and the complexities introduced due to SAR sensors.