I. Introduction
Limited by sensor hardware equipment, the image captured by a single sensor can only describe imaging scenes from a limited aspect. Image fusion technology can fuse the image from various scene into one scene, which further overcomes the limitations of the single sensor and improve the degree of scene monitoring. Moreover, image fusion, particularly multi-modal image fusion, can integrate complementary information from multiple source images. Doing so eliminates the redundancy of various sensors and maximizes the utilization of image information. This unique feature integration ability makes it suitable for various fields, including but not limited to brain tumor segmentation [1], [2], land cover mapping [3], remote sensing image detection [4], and many more tasks beyond these. Infrared and visible image fusion (IVIF) aims to generate fused images that possess the high contrast areas of infrared images while preserving the texture details of visible images. This unique capability enables the fused image to effectively mitigate the impact of external factors, such as weather conditions and variations in brightness. All these advantages have led wide its applications in the military [5], object tracking [6], [7], and video surveillance [8].