1. Introduction
Given an unseen concept, such as green tiger, even though this is a nonexistent stuff humans have never seen, they may associate the known state green with an image of tiger immediately. Inspired by this, Compositional Zero-Shot Learning (CZSL) is proposed with the purpose of equipping models with the ability to recognize novel concepts generated as humans do. Specifically, CZSL learns on visible primitive composed concepts (state and object) in the training phase, and recognizes unseen compositions in the inference phase.
The overview of dfsp. Our method aims to narrow the domain gap between seen and unseen compositions by fusing decomposed features and with image feature , while learn the joint representation between state and object in language branch. Being fused with the state and object features, image feature can learn the response of them respectively and improve the sensitiveness of unseen compositions.