1. Introduction
The distribution gap between training and testing data poses great challenges to the generalization of modern deep learning methods [34], [4]. To improve model’s generalization to testing data which may feature a different data distribution from the training data, domain adaptation has been extensively studied [44] to learn domain invariant features. Nevertheless, the existing unsupervised domain adaptation paradigm requires simultaneous access to both source and target domain data with an off-line training stage [12], [41]. In a realistic scenario, access to target domain data may not become available until the inference stage, and instant prediction on testing data is required without further ado. Therefore, these requirements give rise to the emergence of a new paradigm of adaptation at test time, a.k.a. test-time training/adaptation (TTT/TTA) [40], [43].