I. Introduction
Unmanned Aerial Vehicles (UAVs) are commonly used for commercial purposes due to flexible deployment and ver-satility, such as public surveillance, cartography, search and rescue, as shown in Figure 1. It is critical to automatically detect and locate the UAVs in the non-fly zone to ensure aviation safety or efficient air traffic control in non-fly zones. Specifically, it is essential to distinguish a UAV from an object with a similar shape, such as birds or aircraft. Most of deep learning models, e.g., [34] [35] [41] [36] [38], are developed for the UAV detection based on either RGB (the day vision camera) or IR (the night vision one). However, the UAV detection rate based on RGB is low when the camera has insufficient light in the daytime, e.g., cloudy or stormy weather. The UAV detection based on IR is affected when the UAV overlaps with an object (e.g., buildings or trees) in the background. Due to the effect of the light condition or similar shape of the UAV and other objects, false detection of the UAV and results in a low training accuracy of the deep learning with RGB or IR. Developing a hybrid deep learning model for processing RGB and IR videos is non-trivial since the RGB and IR video frames are trained independently with day or night vision features. As a result, the training for RGB and IR videos can not be directly combined for the UAV detection in a dual vision mode. Moreover, developing a hybrid model can suffer from a high complexity due to feature vanishing problems on the training of the RGB and IR videos.