Conferences >2023 IEEE/CVF International C...

RecursiveDet: End-to-End Region-based Recursive Object Detection

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

End-to-end region-based object detectors like Sparse R-CNN usually have multiple cascade bounding box decoding stages, which refine the current predictions according to t...Show More

Metadata

Abstract:

End-to-end region-based object detectors like Sparse R-CNN usually have multiple cascade bounding box decoding stages, which refine the current predictions according to their previous results. Model parameters within each stage are independent, evolving a huge cost. In this paper, we find the general setting of decoding stages is actually redundant. By simply sharing parameters and making a recursive decoder, the detector already obtains a significant improvement. The recursive decoder can be further enhanced by positional encoding (PE) of the proposal box, which makes it aware of the exact locations and sizes of input bounding boxes, thus becoming adaptive to proposals from different stages during the recursion. Moreover, we also design centerness-based PE to distinguish the RoI feature element and dynamic convolution kernels at different positions within the bounding box. To validate the effectiveness of the proposed method, we conduct intensive ablations and build the full model on three recent mainstream region-based detectors. The RecusiveDet is able to achieve obvious performance boosts with even fewer model parameters and slightly increased computation cost. Codes are available at https://github.com/bravezzzzzz/RecursiveDet.

Published in: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Date of Conference: 01-06 October 2023

Date Added to IEEE Xplore: 15 January 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCV51070.2023.00580

Conference Location: Paris, France

Funding Agency:

Contents

1. Introduction

Object detection has been intensively investigated by computer vision community for decades. Traditional detectors built by deep convolutional neural network (CNN) are either anchor-based [13], [32], [26] or anchor-free [30], [37], [46]. The former performs classification and regression based on pre-defined densely tiled bounding boxes, while the latter only assumes grid points in the 2D image plane. On the other hand, detection can be completed in a single stage, two stages or even multiple cascade stages. The single-stage method directly gives predictions without further modifications, which is usually simple and efficient. Two- or multi-stage methods repeatedly make corrections based on previous results, which offer better results but cost more model parameters and calculations. Except for the first stage, later stages are usually region-based, focusing on the local region within the bounding box, which is often realized by RoI Align [15].

References is not available for this document.

MIT Libraries

MIT Libraries

RecursiveDet: End-to-End Region-based Recursive Object Detection

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

RecursiveDet: End-to-End Region-based Recursive Object Detection

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References