Abstract:
Data hallucination or augmentation is a straightforward solution for few-shot learning (FSL), where FSL is proposed to classify a novel object under limited training samp...Show MoreMetadata
Abstract:
Data hallucination or augmentation is a straightforward solution for few-shot learning (FSL), where FSL is proposed to classify a novel object under limited training samples. Common hallucination strategies use visual or textual knowledge to simulate the distribution of a given novel category and generate more samples for training. However, the diversity and capacity of generated samples through these techniques can be insufficient when the knowledge domain of the novel category is narrow. Therefore, the performance improvement of the classifier is limited. To address this issue, we propose a Symmetric data hallucination strategy with Knowledge Transfer (SHKT) that interacts with multi-modal knowledge in both visual and textual spaces. Specifically, we first calculate the relations based on semantic knowledge and select the most related categories of a given novel category for hallucination. Second, we design two parameter-free data hallucination strategies to enrich the training samples by mixing the given and selected samples in both visual and textual spaces. The generated visual and textual samples improve the visual representation and enrich the textual supervision, respectively. Finally, we connect the visual and textual knowledge through transfer calculation, which not only exchanges content from different modalities but also constrains the distribution of the generated samples during the training. We apply our method to four benchmark datasets and achieve state-of-the-art performance in all experiments. Specifically, compared to the baseline on the Mini-ImageNet dataset, it achieves 12.84% and 3.46% accuracy improvements for 1 and 5 support training samples, respectively.
Published in: IEEE Transactions on Multimedia ( Early Access )