Conferences >2023 IEEE/CVF International C...

Multi-View Active Fine-Grained Visual Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Despite the remarkable progress of Fine-grained visual classification (FGVC) with years of history, it is still limited to recognizing 2D images. Recognizing objects in t...Show More

Metadata

Abstract:

Despite the remarkable progress of Fine-grained visual classification (FGVC) with years of history, it is still limited to recognizing 2D images. Recognizing objects in the physical world (i.e., 3D environment) poses a unique challenge – discriminative information is not only present in visible local regions but also in other unseen views. Therefore, in addition to finding the distinguishable part from the current view, efficient and accurate recognition requires inferring the critical perspective with minimal glances. E.g., a person might recognize a "Ford sedan" with a glance at its side and then know that looking at the front can help tell which model it is. In this paper, towards FGVC in the real physical world, we put forward the problem of multi-view active fine-grained visual recognition (MAFR) and complete this study in three steps: (i) a multi-view, fine-grained vehicle dataset is collected as the testbed, (ii) a pilot experiment is designed to validate the need and research value of MAFR, (iii) a policy-gradient-based framework along with a dynamic exiting strategy is proposed to achieve efficient recognition with active view selection. Our comprehensive experiments demonstrate that the proposed method outperforms previous multi-view recognition works and can extend existing state-of-the-art FGVC methods and advanced neural networks to become "FGVC experts" in the 3D environment. Our code is available at https://github.com/PRIS-CV/MAFR.

Published in: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Date of Conference: 01-06 October 2023

Date Added to IEEE Xplore: 15 January 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCV51070.2023.00151

Conference Location: Paris, France

Funding Agency:

References is not available for this document.

Contents

1. Introduction

In the past two decades, fine-grained visual classification (FGVC) has made significant progress in recognizing sub-categories of objects belonging to the same class. This progress has been demonstrated in various domains, such as recognizing cars [32], [57], aircraft [36], birds [50], [48], and foods [39], with extensive outstanding works surpassing human experts in many application scenarios [34], [56], [19], [53], [13], [4], [5], [16], [14]. However, the previous efforts on FGVC have remained mainly limited to a single-view-based paradigm, where only the visual content within a single static image is considered. This paradigm may be sufficient for coarse-grained classification where inter-class differences are easily captured, such as distinguishing a coupe from other vehicles by its streamlined body, seductive engine, or headlamps. However, fine-grained classification presents a different challenge where discriminative clues are rare and often found in subtle structural differences that are not easily captured by a single static view. For instance, to distinguish between different Ford sedans, one can only rely on subtle differences in the design of car headlights. Predictably, for single-view-based approaches, an image/view without discriminative clues is completely indistinguishable at the fine-grained level, fundamentally limiting the model’s theoretical performance.

References is not available for this document.

Multi-View Active Fine-Grained Visual Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Multi-View Active Fine-Grained Visual Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?