1. Introduction
Representation learning has become the backbone of most modern AI agents. High quality pretrained representations are essential to improving performance on downstream tasks [16], [22], [52], [29]. While conventional approaches rely on labeled data, there has been a recent surge in self-supervised representation learning [20], [15], [37], [46], [39], [8], [53], [32]. In fact, self-supervised representation learning has been closing the gap with and, in some cases, even surpassing its supervised counterpart [9], [24], [11], [10]. Notably, most state-of-the-art methods are converging around and fueled by the central concept of contrastive learning [45], [25], [26], [43], [35], [24], [9].