Loading [MathJax]/extensions/MathMenu.js
Momentum Contrast for Unsupervised Visual Representation Learning | IEEE Conference Publication | IEEE Xplore

Momentum Contrast for Unsupervised Visual Representation Learning


Abstract:

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic ...Show More

Abstract:

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
Date of Conference: 13-19 June 2020
Date Added to IEEE Xplore: 05 August 2020
ISBN Information:

ISSN Information:

Conference Location: Seattle, WA, USA

1. Introduction

Unsupervised representation learning is highly successful in natural language processing, e.g., as shown by GPT [50], [51] and BERT [12]. But supervised pre-training is still dominant in computer vision, where unsupervised methods generally lag behind. The reason may stem from differences in their respective signal spaces. Language tasks have discrete signal spaces (words, sub-word units, etc.) for building tokenized dictionaries, on which unsupervised learning can be based. Computer vision, in contrast, further concerns dictionary building [54], [9], [5], as the raw signal is in a continuous, high-dimensional space and is not structured for human communication (e.g., unlike words).

Contact IEEE to Subscribe

References

References is not available for this document.