Abstract:
RGB-T semantic segmentation aims to enhance the robustness of segmentation methods in complex environments by utilizing thermal information. To facilitate the effective i...Show MoreMetadata
Abstract:
RGB-T semantic segmentation aims to enhance the robustness of segmentation methods in complex environments by utilizing thermal information. To facilitate the effective interaction and fusion of multimodal information, we propose a novel Cross-modality Interaction and Global-feature Fusion Network, namely CIGF-Net. In each feature extraction stage, we propose a Cross-modality Interaction Module (CIM) to enable effective interaction between the RGB and thermal modalities. CIM utilizes channel and spatial attention mechanisms to process the feature information from both modalities. By encouraging cross-modal information exchange, the CIM facilitates the integration of complementary information and improves the overall segmentation performance. Subsequently, the Global-feature Fusion Module (GFM) is proposed to focus on fusing the information provided by the CIM. GFM assigns different weights to the multimodal features to achieve cross-modality fusion. Experimental results show that CIGF-Net achieves state-of-the-art performance on RGB-T image semantic segmentation datasets, with a remarkable 60.8 mIoU on the MFNet dataset and 86.93 mIoU on the PST900 dataset.
Published in: IEEE Transactions on Emerging Topics in Computational Intelligence ( Early Access )