Few-Shot Object Detection via Knowledge Transfer | IEEE Conference Publication | IEEE Xplore

Few-Shot Object Detection via Knowledge Transfer


Abstract:

Conventional methods for object detection usually require substantial amounts of training data and annotated bounding boxes. If there are only a few training data and ann...Show More

Abstract:

Conventional methods for object detection usually require substantial amounts of training data and annotated bounding boxes. If there are only a few training data and annotations, the object detectors easily overfit and fail to generalize. It exposes the practical weakness of the object detectors. On the other hand, human can easily master new reasoning rules with only a few demonstrations using previously learned knowledge. In this paper, we introduce a few-shot object detection via knowledge transfer, which aims to detect objects from a few training examples. Central to our method is prototypical knowledge transfer with an attached meta-learner. The meta-learner takes support set images that include the few examples of the novel categories and base categories, and predicts prototypes that represent each category as a vector. Then, the prototypes reweight each RoI (Region-of-Interest) feature vector from a query image to remodels R-CNN predictor heads. To facilitate the remodeling process, we predict the prototypes under a graph structure, which propagates information of the correlated base categories to the novel categories with explicit guidance of prior knowledge that represents correlations among categories. Extensive experiments on the PASCAL VOC dataset verifies the effectiveness of the proposed method.
Date of Conference: 11-14 October 2020
Date Added to IEEE Xplore: 14 December 2020
ISBN Information:

ISSN Information:

Conference Location: Toronto, ON, Canada

Funding Agency:


I. Introduction

Learning to localize and classify objects in an image is a fundamental problem in computer vision. It has a wide range of applications [1], [2], including robotics, autonomous vehicles and video surveillance. With the success of convolutional neural networks (CNN), great leaps have been achieved in the object detection by remarkable works including Faster-R-CNN [3], Mask R-CNN [4], YOLO [5] and SSD [6]. Despite of the achievements, most object detectors suffer from an important limitation: they rely on huge amounts of training data and heavily annotated labels. For object detection, annotating the data is very expensive, as it requires not only identifying the categorical labels for all objects in the image but also providing accurate localization information through bounding box coordinates. This warrants a demand for effective object detectors that can generalize well from small amounts of annotated data. Recently, several approaches [7], [8] have attempted at resolving the few-shot object detection that aims to detect the data-scarce novel categories as well as data-sufficient base categories. Those methods attach a meta-learner to an existing object detector. The meta-learner takes support set images that include the few examples of the novel categories and a subset of examples from the base categories. Given the support images, the meta-learner is expected to predict categorical prototypes which are used to reweight the feature maps from a query image to build category-discriminative feature maps that remodel a prediction layer. However, in those methods remodelling the prediction layers suffers from a poor embedding space of the prototypes as they predict each prototype independently without considering each others.

References

References is not available for this document.