Loading [MathJax]/extensions/MathMenu.js
WB-DETR: Transformer-Based Detector without Backbone | IEEE Conference Publication | IEEE Xplore

WB-DETR: Transformer-Based Detector without Backbone


Abstract:

Transformer-based detector is a new paradigm in object detection, which aims to achieve pretty-well performance while eliminates the priori knowledge driven components, e...Show More

Abstract:

Transformer-based detector is a new paradigm in object detection, which aims to achieve pretty-well performance while eliminates the priori knowledge driven components, e.g., anchors, proposals and the NMS. DETR, the state-of-the-art model among them, is composed of three sub-modules, i.e., a CNN-based backbone and paired transformer encoder-decoder. The CNN is applied to extract local features and the transformer is used to capture global contexts. This pipeline, however, is not concise enough. In this paper, we propose WB-DETR (DETR-based detector Without Backbone) to prove that the reliance on CNN features extraction for a transformer-based detector is not necessary. Unlike the original DETR, WB-DETR is composed of only an encoder and a decoder without CNN backbone. For an input image, WB-DETR serializes it directly to encode the local features into each individual token. To make up the deficiency of transformer in modeling local information, we design an LIE-T2T (local information enhancement tokens to token) module to enhance the internal information of tokens after unfolding. Experimental results demonstrate that WB-DETR, the first pure-transformer detector without CNN to our knowledge, yields on par accuracy and faster inference speed with only half number of parameters compared with DETR baseline.
Date of Conference: 10-17 October 2021
Date Added to IEEE Xplore: 28 February 2022
ISBN Information:

ISSN Information:

Conference Location: Montreal, QC, Canada
Citations are not available for this document.

1. Introduction

CNN-based approaches [18] have dominated object detection tasks [20], [32] for years. In these methods, a common component is the backbone network [12], [13], [14], [35], acting as extracting image features by a series of convolution and pooling layers. Modern CNN-based detectors [9], [11], [27], [21], [36], [29], [23], [25], [26], [22] regard the detector design as a modules combination process, which always composed of a backbone, a neck [21] and multiple detection heads [3]. Among which, the backbone has become a de facto standard to improve the performance and the design of various backbones is also a focus of research in the field of object detection. As we all know, the equipment of a backbone is essential for existing CNN-based detectors.

Cites in Papers - |

Cites in Papers - IEEE (14)

Select All
1.
Hai Lin, Jin Liu, Xingye Li, Lai Wei, Yuxin Liu, Bing Han, Zhongdai Wu, "DCEA: DETR With Concentrated Deformable Attention for End-to-End Ship Detection in SAR Images", IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol.17, pp.17292-17307, 2024.
2.
Muyi Yan, Shaopeng Wang, Zeyu Lu, "Improving End-to-End Object Detection by Enhanced Attention", 2024 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2024.
3.
Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal, "Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.5840-5850, 2024.
4.
Haoxuan Ding, Junyu Gao, Yuan Yuan, Qi Wang, "FF-LPD: A Real-Time Frame-by-Frame License Plate Detector With Knowledge Distillation and Feature Propagation", IEEE Transactions on Image Processing, vol.33, pp.3893-3906, 2024.
5.
Jyoti Madake, Tejas Lokhande, Atharv Mali, Nachiket Mahale, Shripad Bhatlawande, "TransVOD: Transformer-Based Visual Object Detection for Self-Driving Cars", 2024 International Conference on Current Trends in Advanced Computing (ICCTAC), pp.1-8, 2024.
6.
Yue Wu, Zhixi Shen, Peng Fu, "Efficient Shunted Transformer via DPnP module", 2024 7th International Symposium on Autonomous Systems (ISAS), pp.1-5, 2024.
7.
Jie Du, Chuyang Chen, Yuanman Li, Yaolin Zhu, Peng Liu, Tianfu Wang, "SMOD: An Accurate and Efficient Segmentation-Based Medical Object Detector", IEEE Transactions on Emerging Topics in Computational Intelligence, vol.8, no.6, pp.4106-4118, 2024.
8.
Shanliang Yao, Runwei Guan, Xiaoyu Huang, Zhuoxiao Li, Xiangyu Sha, Yong Yue, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Xiaohui Zhu, Yutao Yue, "Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review", IEEE Transactions on Intelligent Vehicles, vol.9, no.1, pp.2094-2128, 2024.
9.
Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu, "Cascade-DETR: Delving into High-Quality Universal Object Detection", 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp.6681-6691, 2023.
10.
Hyeong Kyu Choi, Chong Keun Paik, Hyun Woo Ko, Min-Chul Park, Hyunwoo J. Kim, "Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes", IEEE Access, vol.11, pp.78623-78643, 2023.
11.
Dongshuo Yin, Yiran Yang, Zhechao Wang, Hongfeng Yu, Kaiwen Wei, Xian Sun, "1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions", 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.20116-20126, 2023.
12.
Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, Christian Rupprecht, "Continual Detection Transformer for Incremental Object Detection", 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.23799-23808, 2023.
13.
Yahu Yang, Xiangzhou Gao, Yu Wang, Shenmin Song, "VAMYOLOX: An Accurate and Efficient Object Detection Algorithm Based on Visual Attention Mechanism for UAV Optical Sensors", IEEE Sensors Journal, vol.23, no.11, pp.11139-11155, 2023.
14.
Ya–Li Li, Shengjin Wang, "R(Det)2: Randomized Decision Routing for Object Detection", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.4815-4824, 2022.

Cites in Papers - Other Publishers (7)

1.
Pilhyeon Lee, Hyeran Byun, "BAM-DETR: Boundary-Aligned Moment Detection Transformer for\\xa0Temporal Sentence Grounding in\\xa0Videos", Computer Vision – ECCV 2024, vol.15060, pp.220, 2025.
2.
Meiting Jin, Junxing Zhang, "Research on Microscale Vehicle Logo Detection Based on Real-Time DEtection TRansformer (RT-DETR)", Sensors, vol.24, no.21, pp.6987, 2024.
3.
Yuan Cao, You Zhou, Zhiwen Zhang, Enyi Yao, "Representation Learning Method for Circular Seal Based on Modified MLP-Mixer", Entropy, vol.25, no.11, pp.1521, 2023.
4.
Zhou Lijuan, Mao Jianing, "Vision Transformer-based recognition tasks: a critical review", Journal of Image and Graphics, vol.28, no.10, pp.2969, 2023.
5.
Murat Tasyurek, Ertugrul Gul, "A new deep learning approach based on grayscale conversion and DWT for object detection on adversarial attacked images", The Journal of Supercomputing, 2023.
6.
Wenjie Li, Xiangpeng Liu, Kang An, Chengjin Qin, Yuhua Cheng, "Table Tennis Track Detection Based on Temporal Feature Multiplexing Network", Sensors, vol.23, no.3, pp.1726, 2023.
7.
Bing Leng, Chunqing Wang, Min Leng, Mingfeng Ge, Wenfei Dong, "Deep learning detection network for peripheral blood leukocytes based on improved detection transformer", Biomedical Signal Processing and Control, vol.82, pp.104518, 2023.
Contact IEEE to Subscribe

References

References is not available for this document.