1. INTRODUCTION
In the past decade, Convolutional Neural Network (CNN) based methods have achieved significant improvements for the task of object detection [1], [2], [3], [4], [5], [6]. These advancements have led to several successful applications in the real-world like video surveillance, autonomous navigation, image analysis, etc. However, existing object detectors are highly data-driven and susceptible to data distribution shift/domain shift. Consider a real-world application such as an autonomous car, where the object detectors are likely trained on data from one particular city, e.g. Tokyo City (TC), also referred to as source domain. While deploying the same model in a different city, e.g., New York City (NYC), also referred to as the target domain, the detector experiences severe performance degradation. This is due to the fact that object appearance, scene type, illumination, background, or weather condition in NYC are visually distinct from the TC. This problem is widely studied under the Unsupervised Domain Adaption (UDA) setting [7, 8, 9, 10, 11, 12, 13, 14], where the detection model is adopted to transfer the knowledge from the labelled source (e.g. TC) to unlabeled target (e.g. NYC) domain to overcome this poor generalization problem.