Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation | IEEE Conference Publication | IEEE Xplore

Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation


Abstract:

We present a self-supervised learning (SSL) method suitable for semi-global tasks such as object detection and semantic segmentation. We enforce local consistency between...Show More

Abstract:

We present a self-supervised learning (SSL) method suitable for semi-global tasks such as object detection and semantic segmentation. We enforce local consistency between self-learned features that represent corresponding image locations of transformed versions of the same image, by minimizing a pixel-level local contrastive (LC) loss during training. LC-loss can be added to existing self-supervised learning methods with minimal overhead. We evaluate our SSL approach on two downstream tasks – object detection and semantic segmentation, using COCO, PASCAL VOC, and CityScapes datasets. Our method outperforms the existing state-of-the-art SSL approaches by 1.9% on COCO object detection, 1.4% on PASCAL VOC detection, and 0.6% on CityScapes segmentation.
Date of Conference: 02-07 January 2023
Date Added to IEEE Xplore: 06 February 2023
ISBN Information:

ISSN Information:

Conference Location: Waikoloa, HI, USA
References is not available for this document.

1. Introduction

Self-supervised learning (SSL) approaches learn generic feature representations from data in the absence of any external supervision. These approaches often solve an in-stance discrimination pretext task in which multiple trans-formations of the same image are required to generate similar learned features. Recent SSL methods have shown remarkable promise in global tasks such as classifying images by training simple classifiers on the features learned via instance discrimination [1], [2], [4], [17], [18]. However, global feature-learning SSL approaches do not explicitly retain spatial information thus rendering them ill-suited for semi-global tasks such as object detection, and instance and semantic segmentation [37], [43].

Select All
1.
Mathilde Caron, Ishan Misra, Julien Mairal et al., "Unsupervised learning of visual features by contrasting cluster assignments", 2020.
2.
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou et al., "Emerging properties in self-supervised vision transformers", 2021.
3.
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, et al., "MMDetection: Open mmlab detection toolbox and benchmark", 2019.
4.
Ting Chen, Simon Kornblith, Mohammad Norouzi and Geoffrey Hinton, "A simple framework for contrastive learning of visual representations", ICML, 2020.
5.
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi and Geoffrey Hinton, "Big self-supervised models are strong semi-supervised learners", 2020.
6.
Xinlei Chen, Haoqi Fan, Ross Girshick and Kaiming He, "Improved baselines with momentum contrastive learning", 2020.
7.
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed and Andrea Vedaldi, "Describing textures in the wild", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3606-3613, 2014.
8.
Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti et al., "Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic)", 2019.
9.
MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark, 2020, [online] Available: https://github.com/open-mmlab/mmsegmentation.
10.
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, et al., "The cityscapes dataset for semantic urban scene understanding", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213-3223, 2016.
11.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei, "Imagenet: A large-scale hierarchical image database", 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.
12.
Carl Doersch, Abhinav Gupta and Alexei A Efros, "Unsupervised Visual Representation Learning by Context Prediction", ICCV, 2015.
13.
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn and Andrew Zisserman, "The pascal visual object classes (voc) challenge", International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
14.
Hugo Germain, Vincent Lepetit and Guillaume Bourmaud, "Visual correspondence hallucination: Towards geometric reasoning", 2021.
15.
Spyros Gidaris, Praveer Singh and Nikos Komodakis, "Unsupervised Representation Learning by Predicting Image Rotations", ICLR, 2018.
16.
Priya Goyal, Dhruv Mahajan, Abhinav Gupta and Ishan Misra, "Scaling and benchmarking self-supervised visual representation learning", Proceedings of the ieee/cvf International Conference on computer vision, pp. 6391-6400, 2019.
17.
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar et al., "Bootstrap your own latent: A new approach to self-supervised learning", 2020.
18.
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie and Ross Girshick, "Momentum contrast for unsupervised visual representation learning", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729-9738, 2020.
19.
Kaiming He, Ross Girshick and Piotr Dollár, "Rethinking imagenet pre-training", Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4918-4927, 2019.
20.
Kaiming He, Georgia Gkioxari, Piotr Dollár and Ross Girshick, "Mask RCNN", Proceedings of the IEEE international conference on computer vision, pp. 2961-2969, 2017.
21.
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep residual learning for image recognition", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
22.
Patrick Helber, Benjamin Bischke, Andreas Dengel and Damian Borth, "Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification", IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 7, pp. 2217-2226, 2019.
23.
Olivier J Henaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord, Oriol Vinyals and João Carreira, "Efficient visual pretraining with contrastive detection", Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10086-10096, 2021.
24.
Olivier J Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, SM Eslami, et al., "Data-efficient image recognition with contrastive predictive coding", 2019.
25.
Ashraful Islam, Chun-Fu Richard Chen, Rameswar Panda, Leonid Karlinsky, Richard Radke and Rogerio Feris, "A broad study on the transferability of visual representations with contrastive learning", Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8845-8855, 2021.
26.
Brenden M Lake, Ruslan Salakhutdinov and Joshua B Tenenbaum, "Human-level concept learning through probabilistic program induction", Science, vol. 350, no. 6266, pp. 1332-1338, 2015.
27.
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár, "Focal loss for dense object detection", Proceedings of the IEEE international conference on computer vision, pp. 2980-2988, 2017.
28.
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, et al., "Swin transformer: Hierarchical vision transformer using shifted windows", 2021.
29.
Jonathan Long, Evan Shelhamer and Trevor Darrell, "Fully convolutional networks for semantic segmentation", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440, 2015.
30.
Sharada P Mohanty, David P Hughes and Marcel Salathé, "Using deep learning for image-based plant disease detection", Frontiers in plant science, vol. 7, pp. 1419, 2016.
Contact IEEE to Subscribe

References

References is not available for this document.