1. Introduction
Originating in metric learning, loss functions based on pairwise distances or similarities [18], [71], [46], [72], [5] are paramount in representation learning. Their power is most notable in category-level tasks where classes at inference are different than classes at learning, for instance fine-grained classification [46], [72], few-shot learning [69], [62] local descriptor learning [19] and instance-level retrieval [16], [53]. There are different ways to use them without supervision [31], [80], [6] and indeed, they form the basis for modern unsupervised representation learning [44], [21], [8].