1. Introduction
Object detection is a fundamental computer vision problem in autonomous robots, including self-driving vehicles and autonomous drones. Such applications require 2D or 3D bounding boxes of scene objects in challenging realworld scenarios, including complex cluttered scenes, highly varying illumination, and adverse weather conditions. The most promising autonomous vehicle systems rely on redundant inputs from multiple sensor modalities [59], [6], [74], including camera, lidar, radar, and emerging sensor such as FIR [30]. A growing body of work on object detection using convolutional neural networks has enabled accurate 2D and 3D box estimation from such multimodal data, typically relying on camera and lidar data [65], [11], [57], [72], [67], [43], [36].
Existing object detection methods, including efficient single-shot detectors (ssd) [41], are trained on automotive datasets that are biased towards good weather conditions. While these methods work well in good conditions [19], [59], they fail in rare weather events (top). Lidar-only detectors, such as the same SSD model trained on projected lidar depth, might be distorted due to severe backscatter in fog or snow (center). These asymmetric distortions are a challenge for fusion methods, that rely on redundant information. The proposed method (bottom) learns to tackle unseen (potentially asymmetric) distortions in multimodal data without seeing training data of these rare scenarios.