I. Introduction
Object detection, as a core task in the field of computer vision, has been widely used in various scenarios, such as video surveillance, resource exploration, and autonomous driving. However, most of the existing object detection methods [1], [2], [3], [4] are designed for RGB images, which cannot get robust detection results under different weather conditions, especially low-light conditions. To address this difficulty, some studies [5], [6], [7], [8] take infrared sensors as a viable alternative for object detection in the absence of light. Infrared sensors can detect the infrared radiation of an object and are insensitive to changes in ambient light. The use of multiple modalities in object detection offers a more comprehensive visual representation compared with unimodal detection, enabling mutual compensation for their respective limitations.