Abstract:
This paper studies the domain generalized remote sensing semantic segmentation (RSSS), aiming to generalize a model trained only on the source domain to unseen domains. E...Show MoreMetadata
Abstract:
This paper studies the domain generalized remote sensing semantic segmentation (RSSS), aiming to generalize a model trained only on the source domain to unseen domains. Existing methods in computer vision treat style information as domain characteristics to achieve domain-agnostic learning. Nevertheless, their generalizability to RSSS remains constrained, due to the incomplete consideration of domain characteristics. We argue that remote sensing scenes have layout differences beyond just style. Considering this, we devise a joint style and layout synthesizing framework, enabling the model to jointly learn out-of-domain samples synthesized from these two perspectives. For style, we estimate the variant intensities of per-class representations affected by domain shift and randomly sample within this modeled scope to reasonably expand the boundaries of style-carrying feature statistics. For layout, we explore potential scenes with diverse layouts in the source domain and propose granularity-fixed and granularity-learnable masks to perturb layouts, forcing the model to learn characteristics of objects rather than variable positions. The mask is designed to learn more context-robust representations by discovering difficult-to-recognize perturbation directions. Subsequently, we impose gradient angle constraints between the samples synthesized using the two ways to correct conflicting optimization directions. Extensive experiments demonstrate the superior generalization ability of our method over existing methods.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Early Access )