Loading [MathJax]/extensions/MathMenu.js
Fully Convolutional Network and Region Proposal for Instance Identification with Egocentric Vision | IEEE Conference Publication | IEEE Xplore

Fully Convolutional Network and Region Proposal for Instance Identification with Egocentric Vision


Abstract:

This paper presents a novel approach for egocentric image retrieval and object detection. This approach uses fully convolutional networks (FCN) to obtain region proposals...Show More

Abstract:

This paper presents a novel approach for egocentric image retrieval and object detection. This approach uses fully convolutional networks (FCN) to obtain region proposals without the need for an additional component in the network and training. It is particularly suited for small datasets with low object variability. The proposed network can be trained end-to-end and produces an effective global descriptor as an image representation. Additionally, it can be built upon any type of CNN pre-trained for classification. Through multiple experiments on two egocentric image datasets taken from museum visits, we show that the descriptor obtained using our proposed network outperforms those from previous state-of-the-art approaches. It is also just as memory-efficient, making it adapted to mobile devices such as an augmented museum audio-guide.
Date of Conference: 22-29 October 2017
Date Added to IEEE Xplore: 22 January 2018
ISBN Information:
Electronic ISSN: 2473-9944
Conference Location: Venice, Italy
References is not available for this document.

1. Introduction

We propose to enhance a museum audio-tour guide with a camera, in order to help user orientation, enable automatic guidance and facilitate museum artifact explanations: when the visitor is close enough to an object, an explanation is automatically launched. The embarked camera is not used for augmented reality but only for the system to localize the user without the need of any extra hardware to be installed in the museum (a very strict constraint in some museums). Hence, only egocentric image analysis and object instance recognition is possible to localize the user on the museum map. The camera is installed on a small device held in the user's chest. Thus, this system needs to recognize object instances (not classes), such as paintings, sculptures, or any exhibited historical heritages. Obviously, the entire museum must be photographed (or video recorded), and each object image then has to be manually localized in the digital museum map.

Select All
1.
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla and J. Sivic, "Netvlad: Cnn architecture for weakly supervised place recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297-5307, 2016.
2.
R. Arandjelovic and A. Zisserman, "Three things everyone should know to improve object retrieval", Computer Vision and Pattern Recognition (CVPR) 2012 IEEE Conference on, pp. 2911-2918, 2012.
3.
A. Babenko and V. Lempitsky, "Aggregating local deep features for image retrieval", Proceedings of the IEEE international conference on computer vision, pp. 1269-1277, 2015.
4.
A. Babenko, A. Slesarev, A. Chigorin and V. Lempitsky, "Neural codes for image retrieval", European conference on computer vision, pp. 584-599, 2014.
5.
A. Babenko, A. Slesarev, A. Chigorin and V. Lempitsky, "Neural codes for image retrieval", European conference on computer vision, pp. 584-599, 2014.
6.
L. A. Barroso, J. Dean and U. Holzle, "Web search for a planet: The google cluster architecture", IEEE micro, vol. 23, no. 2, pp. 22-28, 2003.
7.
O. Chum, J. Philbin, J. Sivic, M. Isard and A. Zisserman, "Total recall: Automatic query expansion with a generative feature model for object retrieval", Computer Vision 2007. ICCV 2007. IEEE 11th International Conference on, pp. 1-8, 2007.
8.
O. Chum, J. Philbin, J. Sivic, M. Isard and A. Zisserman, "Total recall: Automatic query expansion with a generative feature model for object retrieval", Computer Vision 2007. ICCV 2007. IEEE 11th International Conference on, pp. 1-8, 2007.
9.
P. Fischer, A. Dosovitskiy and T. Brox, "Descriptor matching with convolutional neural networks: a comparison to sift", 2014.
10.
Y. Gong, L. Wang, R. Guo and S. Lazebnik, "Multi-scale orderless pooling of deep convolutional activation features", European conference on computer vision, pp. 392-407, 2014.
11.
A. Gordo, J. Almazan, J. Revaud and D. Larlus, "Deep Image Retrieval: Learning Global Representations for Image Search" in Computer Vision - ECCV 2016, Springer, Cham, pp. 241-257, Oct. 2016.
12.
A. Gordo, J. Almazan, J. Revaud and D. Larlus, "End-to-end learning of deep visual representations for image retrieval", 2016.
13.
K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
14.
H. Jegou, M. Douze and C. Schmid, "Hamming embedding and weak geometric consistency for large scale image search", Computer Vision-ECCV 2008, pp. 304-317, 2008.
15.
H. Jegou, M. Douze, C. Schmid and P. Pérez, "Aggregating local descriptors into a compact image representation", Computer Vision and Pattern Recognition (CVPR) 2010 IEEE Conference on, pp. 3304-3311, 2010.
16.
A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageN et Classification with Deep Convolutional Neural Networks" in Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 25, pp. 1097-1105, 2012.
17.
J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
18.
D. G. Lowe, "Distinctive image features from scale-invariant keypoints", International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.
19.
M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronin and C. Schmid, "Local convolutional features with unsupervised training for image retrieval", Proceedings of the IEEE International Conference on Computer Vision, pp. 91-99, 2015.
20.
M. Perd’ och, O. Chum and J. Matas, "Efficient representation of local geometry for large scale object retrieval", Computer Vision and Pattern Recognition 2009. CVPR 2009. IEEE Conference on, pp. 9-16, 2009.
21.
F. Perronnin and C. Dance, "Fisher kernels on visual vocabularies for image categorization", Computer Vision and Pattern Recognition 2007. CVPR07. IEEE Conference on, pp. 1-8, 2007.
22.
J. Philbin, O. Chum, M. Isard, J. Sivic and A. Zisserman, "Object retrieval with large vocabularies and fast spatial matching", Computer Vision and Pattern Recognition 2007. CVPR07. IEEE Conference on, pp. 1-8, 2007.
23.
J. Philbin, O. Chum, M. Isard, J. Sivic and A. Zisserman, "Object retrieval with large vocabularies and fast spatial matching", Computer Vision and Pattern Recognition 2007. CVPR07. IEEE Conference on, pp. 1-8, 2007.
24.
J. Philbin, O. Chum, M. Isard, J. Sivic and A. Zisserman, "Lost in quantization: Improving particular object retrieval in large scale image databases", Computer Vision and Pattern Recognition 2008. CVPR 2008. IEEE Conference on, pp. 1-8, 2008.
25.
M. Portaz, J. Poignant, M. Budnik, P. Mulhern, J. Chevallet and L. Goeuriot, "Construction et évaluation dun corpus pour la recherche dinstances dimages museales", COnférence en Recherche dInformations et Applications - CORIA 2017 14th French Information Retrieval Conference, pp. 17-34, March 29–31,2017.
26.
F. Radenovic, G. Tolias and O. Chum, "CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples" in Computer Vision - ECCV 2016, Cham:Springer, pp. 3-20, Oct. 2016.
27.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, et al., "ImageNet Large Scale Visual Recognition Challenge", International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, Dec. 2015.
28.
A. Salvador, X. Giro-i Nieto, F. Marques and S. Satoh, "Faster R-CNN Features for Instance Search", Apr. 2016.
29.
F. Schroff, D. Kalenichenko and J. Philbin, "Facenet: A unified embedding for face recognition and clustering", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815-823, 2015.
30.
A. Sharif Razavian, H. Azizpour, J. Sullivan and S. Carlsson, "Cnn features off-the-shelf: an astounding baseline for recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806-813, 2014.

Contact IEEE to Subscribe

References

References is not available for this document.