Conferences >2021 IEEE/CVF International C...

WB-DETR: Transformer-Based Detector without Backbone

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Transformer-based detector is a new paradigm in object detection, which aims to achieve pretty-well performance while eliminates the priori knowledge driven components, e...Show More

Metadata

Abstract:

Transformer-based detector is a new paradigm in object detection, which aims to achieve pretty-well performance while eliminates the priori knowledge driven components, e.g., anchors, proposals and the NMS. DETR, the state-of-the-art model among them, is composed of three sub-modules, i.e., a CNN-based backbone and paired transformer encoder-decoder. The CNN is applied to extract local features and the transformer is used to capture global contexts. This pipeline, however, is not concise enough. In this paper, we propose WB-DETR (DETR-based detector Without Backbone) to prove that the reliance on CNN features extraction for a transformer-based detector is not necessary. Unlike the original DETR, WB-DETR is composed of only an encoder and a decoder without CNN backbone. For an input image, WB-DETR serializes it directly to encode the local features into each individual token. To make up the deficiency of transformer in modeling local information, we design an LIE-T2T (local information enhancement tokens to token) module to enhance the internal information of tokens after unfolding. Experimental results demonstrate that WB-DETR, the first pure-transformer detector without CNN to our knowledge, yields on par accuracy and faster inference speed with only half number of parameters compared with DETR baseline.

Published in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Date of Conference: 10-17 October 2021

Date Added to IEEE Xplore: 28 February 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCV48922.2021.00297

Conference Location: Montreal, QC, Canada

Citations are not available for this document.

Contents

1. Introduction

CNN-based approaches [18] have dominated object detection tasks [20], [32] for years. In these methods, a common component is the backbone network [12], [13], [14], [35], acting as extracting image features by a series of convolution and pooling layers. Modern CNN-based detectors [9], [11], [27], [21], [36], [29], [23], [25], [26], [22] regard the detector design as a modules combination process, which always composed of a backbone, a neck [21] and multiple detection heads [3]. Among which, the backbone has become a de facto standard to improve the performance and the design of various backbones is also a focus of research in the field of object detection. As we all know, the equipment of a backbone is essential for existing CNN-based detectors.

References is not available for this document.

MIT Libraries

MIT Libraries

WB-DETR: Transformer-Based Detector without Backbone

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

Cites in Papers - |

Cites in Papers - IEEE (14)

Cites in Papers - Other Publishers (7)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

WB-DETR: Transformer-Based Detector without Backbone

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

Cites in Papers - IEEE (14) | Other Publishers (7)

Cites in Papers - IEEE (14)

Cites in Papers - Other Publishers (7)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cites in Papers - |