1. Introduction
In recent years, many machine learning problems have made considerable headway with the success of deep neural networks [13], [22], [33], [38]. Unfortunately, the performance of deep models drops significantly when training data and testing data come from different distributions [59], which limits their utility in real-world applications. To reduce the distribution shift, a handful of works focus on transfer learning field [56], in particular, domain adaptation (DA) [17], [42], [45], [48], [69], [72] or domain generalization (DG) [40], [41], [52], [71], [83], in which one or more different but related labeled datasets (a.k.a. source domain) are collected to help the model generalize well to unlabeled or unseen samples in new datasets (a.k.a. target domain).
We consider the practical test-time adaptation (tta) setup and compare it with related ones. First, fully tta [70] adapts models on a fixed test distribution with an independently sampled test stream. Then, on this basis, continual tta [73] takes the continually changing distributions into consideration. Next, non-i.i.d. Tta [19] tries to tackle the correlatively sampled test streams on a single test distribution, where the label distribution among a batch of data deviates from that of the test distribution. To be more practical, practical tta strives to connect both worlds: distribution changing and correlation sampling.