I. Introduction
Building vector maps are structured as directional graphs recording footprints, connectivity, and associated attributes, which supports distinct advantages of lossless scalability, convenient topological analysis, free attribute edition, and low storage cost. These characteristics establish the pivotal role of building polygon data in many remote sensing (RS) and geographic information system (GIS) applications, such as population density estimation, disaster management, and urban planning [1], [2], [3]. The brisk demand booms studies on automatic polygonal buildings extraction from RS images in order to replace time-consuming and labor-intensive manual and semiautomatic production. Especially in recent years, deep learning (DL) methods offer promising solutions to automatic polygonal building map extraction, and they can be grouped into three paradigms, namely, segmentation-based, counter-based, and node-based.