Conferences >2023 IEEE/CVF International C...

Dynamic Perceiver for Efficient Visual Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Early exiting has become a promising approach to improving the inference efficiency of deep networks. By structuring models with multiple classifiers (exits), predictions...Show More

Metadata

Abstract:

Early exiting has become a promising approach to improving the inference efficiency of deep networks. By structuring models with multiple classifiers (exits), predictions for "easy" samples can be generated at earlier exits, negating the need for executing deeper layers. Current multi-exit networks typically implement linear classifiers at intermediate layers, compelling low-level features to encapsulate high-level semantics. This sub-optimal design invariably undermines the performance of later exits. In this paper, we propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task with a novel dual-branch architecture. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Bi-directional cross-attention layers are established to progressively fuse the information of both branches. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features. Dyn-Perceiver constitutes a versatile and adaptable framework that can be built upon various architectures. Experiments on image classification, action recognition, and object detection demonstrate that our method significantly improves the inference efficiency of different backbones, outperforming numerous competitive approaches across a broad range of computational budgets. Evaluation on both CPU and GPU platforms substantiate the superior practical efficiency of Dyn-Perceiver. Code is available at https://www.github.com/LeapLabTHU/Dynamic_Perceiver.

Published in: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Date of Conference: 01-06 October 2023

Date Added to IEEE Xplore: 15 January 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCV51070.2023.00551

Conference Location: Paris, France

Funding Agency:

Contents

1. Introduction

Convolutional neural networks (CNNs) [19], [25], [67], [49], [40], [34] and vision Transformers [11], [54], [72], [39], [59], [41] have precipitated substantial advancements in visual recognition. Despite concerted efforts towards scaling up vision models for superior accuracy [75], [38], [51], the high computational demands have acted as a deterrent to their deployment in resource-constrained scenarios. Research endeavours towards improving the inference efficiency of deep networks span a multitude of directions, including lightweight architecture design [23], [77], [22], pruning [15], [20], [70], quantization [30], [73], etc. In contrast to traditional models, which adhere to a static computational graph during testing, dynamic networks [16], [3], [35], [61], [18], [64], [65], [79], [78] can adapt their computation with varying input complexities, leading to promising results in efficient visual recognition.

References is not available for this document.

Dynamic Perceiver for Efficient Visual Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Dynamic Perceiver for Efficient Visual Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References