1. Introduction
Monocular height estimation aims to assign each pixel to an accurate height value in a single remote sensing image, which plays a critical role in many applications, e.g., urban man-agement, land planning, and disaster monitoring. A stream of existing research regard height estimation as a regression task [1], [2]. However, the regression problem, as a non-convex optimization, makes it difficult for the model to realize the accurate prediction from the input image to the ground truth values in the case of the infinite continuous solution space, which often leads to sub-optimization.
An example patch of 3d reconstructed remote sensing image. The height of the object varies greatly.