Loading [MathJax]/extensions/MathMenu.js
Multi-View Active Fine-Grained Visual Recognition | IEEE Conference Publication | IEEE Xplore

Multi-View Active Fine-Grained Visual Recognition


Abstract:

Despite the remarkable progress of Fine-grained visual classification (FGVC) with years of history, it is still limited to recognizing 2D images. Recognizing objects in t...Show More

Abstract:

Despite the remarkable progress of Fine-grained visual classification (FGVC) with years of history, it is still limited to recognizing 2D images. Recognizing objects in the physical world (i.e., 3D environment) poses a unique challenge – discriminative information is not only present in visible local regions but also in other unseen views. Therefore, in addition to finding the distinguishable part from the current view, efficient and accurate recognition requires inferring the critical perspective with minimal glances. E.g., a person might recognize a "Ford sedan" with a glance at its side and then know that looking at the front can help tell which model it is. In this paper, towards FGVC in the real physical world, we put forward the problem of multi-view active fine-grained visual recognition (MAFR) and complete this study in three steps: (i) a multi-view, fine-grained vehicle dataset is collected as the testbed, (ii) a pilot experiment is designed to validate the need and research value of MAFR, (iii) a policy-gradient-based framework along with a dynamic exiting strategy is proposed to achieve efficient recognition with active view selection. Our comprehensive experiments demonstrate that the proposed method outperforms previous multi-view recognition works and can extend existing state-of-the-art FGVC methods and advanced neural networks to become "FGVC experts" in the 3D environment. Our code is available at https://github.com/PRIS-CV/MAFR.
Date of Conference: 01-06 October 2023
Date Added to IEEE Xplore: 15 January 2024
ISBN Information:

ISSN Information:

Conference Location: Paris, France

Funding Agency:

References is not available for this document.

1. Introduction

In the past two decades, fine-grained visual classification (FGVC) has made significant progress in recognizing sub-categories of objects belonging to the same class. This progress has been demonstrated in various domains, such as recognizing cars [32], [57], aircraft [36], birds [50], [48], and foods [39], with extensive outstanding works surpassing human experts in many application scenarios [34], [56], [19], [53], [13], [4], [5], [16], [14]. However, the previous efforts on FGVC have remained mainly limited to a single-view-based paradigm, where only the visual content within a single static image is considered. This paradigm may be sufficient for coarse-grained classification where inter-class differences are easily captured, such as distinguishing a coupe from other vehicles by its streamlined body, seductive engine, or headlamps. However, fine-grained classification presents a different challenge where discriminative clues are rare and often found in subtle structural differences that are not easily captured by a single static view. For instance, to distinguish between different Ford sedans, one can only rely on subtle differences in the design of car headlights. Predictably, for single-view-based approaches, an image/view without discriminative clues is completely indistinguishable at the fine-grained level, fundamentally limiting the model’s theoretical performance.

Select All
1.
John Aloimonos, Isaac Weiss and Amit Bandyopadhyay, "Active vision", International Journal of Computer Vision, 1988.
2.
Yasuhiro Aoki, Hunter Goforth, Rangaprasad Arun Srivatsan and Simon Lucey, "Pointnetlk: Robust & efficient point cloud registration using pointnet", CVPR, 2019.
3.
Samy Bengio, "Sharing representations for long tail computer vision problems", ACM ICMI, 2015.
4.
Dongliang Chang, Kaiyue Pang, Ruoyi Du, Yujun Tong, Yi-Zhe Song, Zhanyu Ma, et al., "Making a bird ai expert work for you and me", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
5.
Dongliang Chang, Kaiyue Pang, Yixiao Zheng, Zhanyu Ma, Yi-Zhe Song and Jun Guo, "Your “ flamingo” is my bird: Fine-grained or not", CVPR, 2021.
6.
Dongliang Chang, Yujun Tong, Ruoyi Du, Timothy Hospedales, Yi-Zhe Song and Zhanyu Ma, "An erudite fine-grained visual classification model", CVPR, 2023.
7.
Shuo Chen, Tan Yu and Ping Li, "Mvt: Multi-view vision transformer for 3d object recognition", BMVC, 2021.
8.
Songle Chen, Lintao Zheng, Yan Zhang, Zhixin Sun and Kai Xu, "Veram: View-enhanced recurrent attention model for 3d shape classification", IEEE Transactions on Visualization and Computer Graphics, 2018.
9.
Yue Chen, Yalong Bai, Wei Zhang and Tao Mei, "Destruction and construction learning for fine-grained image recognition", CVPR, 2019.
10.
Han-Pang Chiu, Leslie Pack Kaelbling and Tomás Lozano-Pérez, "Virtual training for multi-view object class recognition", CVPR, 2007.
11.
Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho and Yoshua Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling", NeurIPS Workshops, 2014.
12.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly et al., "An image is worth 16x16 words: Transformers for image recognition at scale", ICLR, 2020.
13.
Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Zhanyu Ma, Yi-Zhe Song, et al., "Fine-grained visual classification via progressive multi-granularity training of jigsaw patches", ECCV, 2020.
14.
Ruoyi Du, Dongliang Chang, Kongming Liang, Timothy Hospedales, Yi-Zhe Song and Zhanyu Ma, "On-the-fly category discovery", CVPR, 2023.
15.
Ruoyi Du, Dongliang Chang, Zhanyu Ma, Yi-Zhe Song and Jun Guo, "Clue me in: Semi-supervised fgvc with out-of-distribution data", 2021.
16.
Ruoyi Du, Jiyang Xie, Zhanyu Ma, Dongliang Chang, Yi-Zhe Song and Jun Guo, "Progressive learning of category-consistent multi-granularity features for fine-grained visual classification", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
17.
Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji and Yue Gao, "Gvcnn: Group-view convolutional neural networks for 3d shape recognition", CVPR, 2018.
18.
Stan Franklin, "Autonomous agents as embodied ai", Cybernetics & Systems, 1997.
19.
Jianlong Fu, Heliang Zheng and Tao Mei, "Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition", CVPR, 2017.
20.
Yang Gao, Oscar Beijbom, Ning Zhang and Trevor Darrell, "Compact bilinear pooling", CVPR, 2016.
21.
Yurong Guo, Ruoyi Du, Xiaoxu Li, Jiyang Xie, Zhanyu Ma and Yuan Dong, "Learning calibrated class centers for few-shot classification by pair-wise similarity", IEEE Transactions on Image Processing, 2022.
22.
Zhizhong Han, Honglei Lu, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Matthias Zwicker, et al., "3d2seqviews: Aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation", IEEE Transactions on Image Processing, 2019.
23.
Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Matthias Zwicker, et al., "Seqviews2seqlabels: Learning 3d global features via aggregating sequential views by rnn with attention", IEEE Transactions on Image Processing, 2018.
24.
Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, et al., "Transfg: A transformer architecture for fine-grained recognition", AAAI, 2022.
25.
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep residual learning for image recognition", CVPR, 2016.
26.
Sepp Hochreiter and Jürgen Schmidhuber, "Long short-term memory", Neural Computation, 1997.
27.
Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten and Kilian Weinberger, "Multi-scale dense networks for resource efficient image classification", ICLR, 2018.
28.
Dinesh Jayaraman and Kristen Grauman, "Look-ahead before you leap: end-to-end active recognition by forecasting the effect of motion", ECCV, 2016.
29.
Edward Johns, Stefan Leutenegger and Andrew J Davison, "Pairwise decomposition of image sequences for active multi-view recognition", CVPR, 2016.
30.
Asako Kanezaki, Yasuyuki Matsushita and Yoshifumi Nishida, "Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints", CVPR, 2018.
Contact IEEE to Subscribe

References

References is not available for this document.