With the renaissance of deep neural networks (DNNs), humans have made a huge breakthrough in computer vision (CV) tasks, especially due to the emergence of convolutional neural networks (CNNs)-based and transformer-based variants. Nowadays, the ability of deep models on various tasks has far exceeded human standards. However, they still have many fatal defects. One of the most serious things is that they need numerous labeled instances to optimize parameters during training. For some species without enough or no training samples, they cannot be recognized as flexibly as humans. In order to resolve this dilemma, researchers attempt to classify certain species when training samples do not exist, which introduces a new enlightenment to image recognition called zero-shot learning (ZSL). Nonetheless, researchers gradually discover that the conventional ZSL task does not fit the realistic scenarios. In real scenarios, the conditions faced are more harsh, that is, when performing ZSL task, not only new unseen classes must be introduced, but also the model’s ability to classify the original seen classes should be also maintained. Based on the abovementioned, Chao et al. proposed another experimental setup and named it generalized ZSL (GZSL).
Abstract:
Due to the prosperous development of generative models, research works have achieved great success on the generalized zero-shot learning (GZSL) task. In most generative m...Show MoreMetadata
Abstract:
Due to the prosperous development of generative models, research works have achieved great success on the generalized zero-shot learning (GZSL) task. In most generative methods of GZSL, researchers try to utilize attributes and normally distributed noise to generate visual features, which ignores whether the normal distribution can perfectly represent all categories. Therefore, in this article, we exploit variational auto-encoders (VAE) and visual features to generate image-level noise that can preserve class-level characteristics in more detail and propose a mechanism called more factual generative network (MFGN) to achieve more authentic generative process. In other words, it is to transfer the seen feature distribution to the unseen domains and regulate the knowledge to correct the generation of unseen samples. Extensive experiments are conducted on four popular datasets and the results demonstrate the effectiveness of the proposed work.
Published in: IEEE MultiMedia ( Volume: 29, Issue: 3, 01 July-Sept. 2022)