Loading [MathJax]/extensions/TeX/ietmacros.js
Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation | IEEE Journals & Magazine | IEEE Xplore

Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation


Abstract:

Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadeq...Show More

Abstract:

Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this letter we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propose a weakly-supervised adaptation strategy, in which reinforcement learning is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback. At the same time, knowledge distillation is employed to guarantee learning stability and to compress and transfer knowledge from more powerful but slower trackers. Extensive experiments on five different robotic vision domains demonstrate the relevance of our methodology. Real-time speed is achieved on embedded devices and on machines without GPUs, while accuracy reaches significant results.
Published in: IEEE Robotics and Automation Letters ( Volume: 6, Issue: 3, July 2021)
Page(s): 5016 - 5023
Date of Publication: 02 April 2021

ISSN Information:

Funding Agency:


I. Introduction

Real-time visual object tracking is a key module in many robotic perception systems [1]–[6]. Recently, deep regression trackers [7]–[9] (DRTs) have been proposed in the robotics community [7] because of their efficiency and generality. Thanks to their simple architecture, DRTs achieve processing speeds that surpass 100 FPS, making them suitable even for low-resource robots. Moreover, with the availability of large-scale computer vision datasets [10], these trackers can learn to track a large variety of targets without relying on particular assumptions, thus simplifying the development of tracking pipelines. However, acquiring thousands of videos for training these systems is not realistic in many real-world robotic application domains. Additionally, many domains offer particular scenarios that differ much from the examples which DRTs are trained on. For example, drone [11] and driving [3], [12] applications require tracking objects from particular camera views. Underwater robots offer uncommon targets and settings [4], [13]. Other robotics systems can use different imaging modalities [2]. Robotic manipulation configurations need the tracking of atypical objects [14]. As shown in Fig. 1, these situations cause DRTs’ accuracy to be very low. This is due to their deep learning architecture that is subject to overfitting if trained directly on small application datasets, and suffers from the shift between training and test data distributions when trained for large-scale generic object tracking.

References

References is not available for this document.