I. Introduction
Despite their great success, the decision-making process of current deep models lacks interpretability, which hinders their applicability to high-stake problems in areas such as healthcare and finance. To provide explanations for deep models, many explainability methods [1], [2], [3] have been proposed to decipher the predictions made by deep models. However, these post-hoc methods are unable to provide sufficient details for explaining the complicated decision pathway of the black-box model. Instead of explaining a black-box model, many research works [4], [5], [6] aim to construct a self-explainable model. Chen et al. [6] proposed a transparent model by replacing the conventional extractive reasoning process with a case-based reasoning process, which compares the similarity between the input features and learned visual feature vectors called "prototypes" to make predictions. Due to the transparency of the case-based reasoning architecture, the prototypes are also extended to other problems including hierarchical classification and zero-shot classification [7], [8]. However, humans tend to hierarchically compare features of different granularities to recognize objects [9], [10] as shown in the top of Figure 1, which most of the current deep models fail to imitate. Current deep learning models often make their predictions for all the classes in a single go, as shown in the bottom of Figure 1. Predicting classes in one layer without distinction impedes the model to extract distinctive features and hinders humans from understanding the decision-making process of models.