Journals & Magazines >IEEE Transactions on Image Pr... >Volume: 33

Feature Mixture on Pre-Trained Model for Few-Shot Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Few-shot learning (FSL) aims at recognizing a novel object under limited training samples. A robust feature extractor (backbone) can significantly improve the recognition...Show More

Metadata

Abstract:

Few-shot learning (FSL) aims at recognizing a novel object under limited training samples. A robust feature extractor (backbone) can significantly improve the recognition performance of the FSL model. However, training an effective backbone is a challenging issue since 1) designing and validating structures of backbones are time-consuming and expensive processes, and 2) a backbone trained on the known (base) categories is more inclined to focus on the textures of the objects it learns, which is hard to describe the novel samples. To solve these problems, we propose a feature mixture operation on the pre-trained (fixed) features: 1) We replace a part of the values of the feature map from a novel category with the content of other feature maps to increase the generalizability and diversity of training samples, which avoids retraining a complex backbone with high computational costs. 2) We use the similarities between the features to constrain the mixture operation, which helps the classifier focus on the representations of the novel object where these representations are hidden in the features from the pre-trained backbone with biased training. Experimental studies on five benchmark datasets in both inductive and transductive settings demonstrate the effectiveness of our feature mixture (FM). Specifically, compared with the baseline on the Mini-ImageNet dataset, it achieves 3.8% and 4.2% accuracy improvements for 1 and 5 training samples, respectively. Additionally, the proposed mixture operation can be used to improve other existing FSL methods based on backbone training.

Published in: IEEE Transactions on Image Processing ( Volume: 33)

Page(s): 4104 - 4115

Date of Publication: 02 July 2024

ISSN Information:

PubMed ID: 38954579

DOI: 10.1109/TIP.2024.3411452

Funding Agency:

Contents

I. Introduction

Few-shot learning (FSL) aims to recognize novel categories containing only a few labeled examples. The prerequisite of this task is to make full use of the base categories that contain abundant labeled training samples [2], [3], [4]. Recent FSL work has achieved promising results by obtaining a robust feature extractor (backbone) in a good structural design or model training [2], [3], [5]. It trains a backbone on the known (base) categories and aims to yield a transferable feature representation (textures and structures) to describe the novel categories. However, training and validating backbones from scratch are time-consuming and expensive processes. Meanwhile, the backbone trained on the base categories is more inclined to focus on the textures of the objects it learns [4], [6], [7], [8]. As shown in Fig. 1, a backbone trained on the base categories responds to different regions (provided by Grad-CAM [1]) on samples of different categories: Given a base sample in the “Unicycle” category, the responsive regions on the image focus on the body of the cycle since the backbone is trained with many unicycle images. However, this backbone may deviate the responses from novel objects and overlook them. For example, it concentrates on the baby carriage which contains wheels but not the dog in the image with the label “Retriever”. To enlarge or correct the response regions of novel objects, many methods have been proposed. For example, Liu et al. mix the image patches randomly and use the mixed image as input of backbone [5]. Wang et al. split an image into three parts and represent it by three views [4]. However, the aforementioned methods need to be trained from scratch. Moreover, the latter method requires training three times larger backbones. Fig. 1.

The responsive regions of the backbone are visualized by Grad-CAM [1] from several samples in Mini-ImageNet, where the backbone is trained on the base categories.

References is not available for this document.

Feature Mixture on Pre-Trained Model for Few-Shot Learning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Feature Mixture on Pre-Trained Model for Few-Shot Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References