I. Introduction
During the last decade, convolutional deep neural networks have shown extraordinary performance in image classification, segmentation, and object detection. More recently, they are also used in end-user products and industrial settings e.g., autonomous driving [1], intelligent video surveillance [2], [3] and even human medicine [4]. Due to its variety of application, even in critical environments, providers and users of the systems have a high demand for explainability and interpretability methods of these black box models. A vast number of different methods [5]–[13] have already been developed and allow deep learning-based image classifiers to give a visual explanation of their classification results. There are various ways for such explanations, including but not limited to feature visualization [14] and saliency maps [15] (see Figure 1). Feature visualization is used to give an insight view on what maximizes the activation in a specific neuron or channel of the network. A saliency map highlights crucial image areas in the input data for a given network output.