Conferences >2020 IEEE/CVF Conference on C...

Momentum Contrast for Unsupervised Visual Representation Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic ...Show More

Metadata

Abstract:

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

Published in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 13-19 June 2020

Date Added to IEEE Xplore: 05 August 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR42600.2020.00975

Conference Location: Seattle, WA, USA

Contents

1. Introduction

Unsupervised representation learning is highly successful in natural language processing, e.g., as shown by GPT [50], [51] and BERT [12]. But supervised pre-training is still dominant in computer vision, where unsupervised methods generally lag behind. The reason may stem from differences in their respective signal spaces. Language tasks have discrete signal spaces (words, sub-word units, etc.) for building tokenized dictionaries, on which unsupervised learning can be based. Computer vision, in contrast, further concerns dictionary building [54], [9], [5], as the raw signal is in a continuous, high-dimensional space and is not structured for human communication (e.g., unlike words).

References is not available for this document.

Momentum Contrast for Unsupervised Visual Representation Learning

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Momentum Contrast for Unsupervised Visual Representation Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?