Abstract:
Due to obstructions such as trees and buildings, single-modal satellite or aerial images are insufficient for continuous high-precision representation of road features. T...Show MoreMetadata
Abstract:
Due to obstructions such as trees and buildings, single-modal satellite or aerial images are insufficient for continuous high-precision representation of road features. To address this problem, this article proposes a lightweight cross-modal information interaction for road feature extraction (LCIRE-Net) from high-resolution remote sensing images (HRSIs) and GPS trajectory/LiDAR images. We design two parallel encoders for modality feature learning, using pairs of multimodal information as inputs to the encoders. By designing a cross-modal information dynamic interaction (CMIDI) mechanism, thresholds are used to decide whether to supplement redundant information from another modality, solving the issue of ineffective fusion calculations due to minor differences in multimodal feedback. A multimodal feature fusion module (MFFM) is proposed after the encoder output to achieve effective dual-modal fusion while addressing the interference of redundant noise generated during extraction. Subsequently, we present the feature refinement and enhancement module (FREM), which successfully captures edge features of the image using the receptive field of dilated convolution kernels. Additionally, in terms of lightweight design, we employ a novel SOTA method on D-LinkNet by replacing the original residual blocks with an enhanced ghost basic block. Extensive experiments are conducted on the BJRoad, Porto, and TLCGIS datasets, demonstrating that our network, with smaller parameters and FLOPs, outperforms other road-based semantic segmentation methods.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 63)
Funding Agency:
No metrics found for this document.