1. Introduction
Deep models are well known for beating humans in visual recognition [13]. However, this is just a victory of specialist models over generalist humans - existing vision recognition models are mostly closed-set experts. Given a defined category set, huge datasets are gathered and annotated, and then, deep models trained with the annotated data can easily handle such an in-category recognition due to their great fitting ability. However, these models are arguably only learning to memorize in that they are restricted to the defined category set and are incapable of modeling novel categories. Although paradigms like open set recognition [9] aim to filter out the out-of-category samples, simply rejecting them is not satisfactory. For humans, visual recognition is far beyond a closed-set problem - instead of learning to memorize, we learn to cognize. In particular, given samples containing novel categories, we can not only tell which are novel but we can also tell which may share the same novel category. e.g., even you have never seen “hedgehogs”, you can easily realize that they differ from other creatures you have seen before and realise that multiple hedgehog images belong to the same category, even if you don't know the name.
Comparison of the conventional ncd setting and the proposed ocd setting. (a) Ncd adopts transductive learning and offline inference. (b) Ocd removes the predefined query set assumption and conducts inductive learning and instant inference.