Loading [MathJax]/extensions/MathZoom.js
Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation | IEEE Journals & Magazine | IEEE Xplore

Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation


Abstract:

Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadeq...Show More

Abstract:

Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this letter we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propose a weakly-supervised adaptation strategy, in which reinforcement learning is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback. At the same time, knowledge distillation is employed to guarantee learning stability and to compress and transfer knowledge from more powerful but slower trackers. Extensive experiments on five different robotic vision domains demonstrate the relevance of our methodology. Real-time speed is achieved on embedded devices and on machines without GPUs, while accuracy reaches significant results.
Published in: IEEE Robotics and Automation Letters ( Volume: 6, Issue: 3, July 2021)
Page(s): 5016 - 5023
Date of Publication: 02 April 2021

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Real-time visual object tracking is a key module in many robotic perception systems [1]–[6]. Recently, deep regression trackers [7]–[9] (DRTs) have been proposed in the robotics community [7] because of their efficiency and generality. Thanks to their simple architecture, DRTs achieve processing speeds that surpass 100 FPS, making them suitable even for low-resource robots. Moreover, with the availability of large-scale computer vision datasets [10], these trackers can learn to track a large variety of targets without relying on particular assumptions, thus simplifying the development of tracking pipelines. However, acquiring thousands of videos for training these systems is not realistic in many real-world robotic application domains. Additionally, many domains offer particular scenarios that differ much from the examples which DRTs are trained on. For example, drone [11] and driving [3], [12] applications require tracking objects from particular camera views. Underwater robots offer uncommon targets and settings [4], [13]. Other robotics systems can use different imaging modalities [2]. Robotic manipulation configurations need the tracking of atypical objects [14]. As shown in Fig. 1, these situations cause DRTs’ accuracy to be very low. This is due to their deep learning architecture that is subject to overfitting if trained directly on small application datasets, and suffers from the shift between training and test data distributions when trained for large-scale generic object tracking.

Select All
1.
N. Papanikolopoulos, P. K. Khosla and T. Kanada, "Vision and control techniques for robotic visual tracking", Proc. IEEE Int. Conf. Robot. Automat., vol. 1, pp. 857-864, 1991.
2.
J. Portmann, S. Lynen, M. Chli and R. Siegwart, "People detection and tracking from aerial thermal views", Proc. - IEEE Int. Conf. Robot. Automat., pp. 1794-1800, Sep. 2014.
3.
A. Geiger, P. Lenz, C. Stiller and R. Urtasun, "Vision meets robotics: The KITTI dataset", Int. J. Robot. Res., vol. 32, no. 11, pp. 1231-1237, 2013.
4.
F. Shkurti et al., "Underwater multi-robot convoying using visual tracking by detection", Proc. IEEE Int. Conf. Intell. Robots Syst., vol. 2017, pp. 4189-4196, Dec. 2017.
5.
J. Luiten, T. Fischer and B. Leibe, "Track to reconstruct and reconstruct to track", IEEE Robot. Automat. Lett., vol. 5, no. 2, pp. 1803-1810, Apr. 2020.
6.
M. Dunnhofer et al., "Siam-u-Net: Encoder-decoder siamese network for knee cartilage tracking in ultrasound images", Med. Image Anal., vol. 60, Feb. 2020.
7.
D. Gordon, A. Farhadi and D. Fox, "Re 3 : Real-time recurrent regression networks for visual tracking of generic objects", IEEE Robot. Automat. Lett., vol. 3, no. 2, pp. 788-795, Apr. 2018.
8.
D. Held, S. Thrun and S. Savarese, "Learning to track at 100 FPS with deep regression networks", Proc. Eur. Conf. Comput. Vis., pp. 749-765, 2016.
9.
M. Dunnhofer, N. Martinel and C. Micheloni, "Tracking-by-trackers with a distilled and reinforced model", Proc. Asian Conf. Comput. Vis., pp. 631-650, 2020.
10.
J. Deng, W. Dong, R. Socher, L.-J Li, K. Li and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 248-255, Jun. 2009.
11.
K. Chaudhary, M. Zhao, F. Shi, X. Chen, K. Okada and M. Inaba, "Robust real-time visual tracking using dual-frame deep comparison network integrated With correlation filters", Proc. IEEE Int. Conf. Intell. Robots Syst., vol. 2017, pp. 6837-6842.
12.
S. Reddy, M. Mathew, L. Gomez, M. Rusinol, D. Karatzas and C. V. Jawahar, "RoadText-1 K: Text detection recognition dataset for driving videos", Proc. - IEEE Int. Conf. Robot. Automat., pp. 11074-11080, May 2020.
13.
K. De Langis and J. Sattar, "Realtime multi-diver tracking and re-identification for underwater human-robot collaboration", Proc. - IEEE Int. Conf. Robot. Automat., pp. 11140-11146, May 2020.
14.
A. Roy, X. Zhang, N. Wolleb, C. P. Quintero and M. Jagersand, "Tracking benchmark and evaluation for manipulation tasks", Proc. - IEEE Int. Conf. Robot. Automat., pp. 2448-2453, Jun. 2015.
15.
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing and J. Yan, "SIAMRPN: Evolution of siamese visual tracking with very deep networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 4277-4286, Jun. 2019.
16.
H. Nam and B. Han, "Learning multi-domain convolutional neural networks for visual tracking", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 4293-4302, 2016.
17.
M. Danelljan, G. Bhat, F. S. Khan and M. Felsberg, "ATOM: Accurate tracking by overlap maximization", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 4660-4669, 2019.
18.
G. Bhat, M. Danelljan, L. Van Gool and R. Timofte, "Learning discriminative model prediction for tracking", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 6182-6191, 2019.
19.
S. J. Pan and Q. Yang, "A Survey on Transfer Learning" in IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345-1359, 2010.
20.
G. Csurka, "A comprehensive survey on domain adaptation for visual applications", Proc. Adv. Comput. Vis. Pattern Recognit., no. 9783319583464, pp. 1-35, 2017.
21.
G. Angeletti, B. Caputo and T. Tommasi, "Adaptive Deep Learning Through Visual Domain Localization", Proc. - IEEE Int. Conf. Robot. Automat., pp. 7135-7142, Sep. 2018.
22.
J. Zhang et al., "VR-Goggles for robots: Real-to-sim domain adaptation for visual control", IEEE Robot. Automat. Lett., vol. 4, no. 2, pp. 1148-1155, Apr. 2019.
23.
A. Carlson, K. A. Skinner, R. Vasudevan and M. Johnson-Roberson, "Sensor transfer: Learning optimal sensor effect image augmentation for sim-to-real domain adaptation", IEEE Robot. Automat. Lett., vol. 4, no. 3, pp. 2431-2438, Jul. 2019.
24.
M. Wulfmeier, A. Bewley and I. Posner, "Addressing appearance change in outdoor robotics with adversarial domain adaptation", Proc. IEEE Int. Conf. Intell. Robots Syst., vol. 2017, pp. 1551-1558, Dec. 2017.
25.
M. Mancini, H. Karaoguz, E. Ricci, P. Jensfelt and B. Caputo, "Kitting in the wild through online domain adaptation", Proc. IEEE Int. Conf. Intell. Robots Syst., pp. 1103-1109, Dec. 2018.
26.
K. Fang, Y. Bai, S. Hinterstoisser, S. Savarese and M. Kalakrishnan, "Multi-task domain adaptation for deep learning of instance grasping from simulation", Proc. - IEEE Int. Conf. Robot. Automat., pp. 3516-3523, Sep. 2018.
27.
M. R. Loghmani, L. Robbiano, M. Planamente, K. Park, B. Caputo and M. Vincze, "Unsupervised domain adaptation through inter-modal rotation for RGB-D object recognition", IEEE Robot. Automat. Lett., vol. 5, no. 4, pp. 6631-6638, Oct. 2020.
28.
E. Bellocchio, G. Costante, S. Cascianelli, M. L. Fravolini and P. Valigi, "Combining domain adaptation and spatial consistency for unseen fruits counting: A quasi-unsupervised approach", IEEE Robot. Automat. Lett., vol. 5, no. 2, pp. 1079-1086, Apr. 2020.
29.
G. Hinton, O. Vinyals and J. Dean, "Distilling the knowledge in a neural network", Proc. Deep Learn. Workshop NIPS, 2015.
30.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA:MIT Press, 2018.
Contact IEEE to Subscribe

References

References is not available for this document.