1. Introduction
Composed image retrieval (CIR) is a challenging vision-language (VL) task that takes a composed query of image and text, aiming to search relative images for both conditions [29]. As language serves as the most natural method for encoding human interaction, CIR provides a higher degree of freedom and a better user experience for image-based search engine applications, such as web commerce.