Deep Saliency Map Generators for Multispectral Video Classification | IEEE Conference Publication | IEEE Xplore

Deep Saliency Map Generators for Multispectral Video Classification


Abstract:

Despite their black box nature, deep neural networks have been successfully used in practical applications lately. In areas where the results of these applications can le...Show More

Abstract:

Despite their black box nature, deep neural networks have been successfully used in practical applications lately. In areas where the results of these applications can lead to safety hazards or decisions of ethical relevance, the application provider is accountable for the resulting decisions and should therefore be able to explain, how, and why a specific decision was made. For image processing networks, saliency map generators are a possible solution. A saliency map gives a visual hint on what is of special importance for the network’s decision, can reveal possible dataset biases and give a more profound insight in the decision process of the black box.This paper investigates how 2D saliency map generators need to be adapted for 3D input data, and additionally, how the methods behave when applied not only to ordinary video input but rather multispectral 3D input data. This is exemplarily shown on 3D video input data in human action recognition in the infrared and visual spectrum and evaluated by using the insertion and deletion metrics. The dataset used in this work is the Multispectral Action Dataset, where each scene is available in the long-wave infrared as well as the visual spectrum. To be able to draw a more general conclusion, the two investigated networks, 3D-ResNet 18 and Persistent Appearance Network (PAN), follow a different mindset.It could be shown, that the saliency methods can also be applied to 3D input data with remarkable results. The results show that a combined training with both, infrared and RGB 3D input data, lead to more focused saliency maps and outperform a training with only RGB or infrared data.
Date of Conference: 21-25 August 2022
Date Added to IEEE Xplore: 29 November 2022
ISBN Information:

ISSN Information:

Conference Location: Montreal, QC, Canada

I. Introduction

During the last decade, convolutional deep neural networks have shown extraordinary performance in image classification, segmentation, and object detection. More recently, they are also used in end-user products and industrial settings e.g., autonomous driving [1], intelligent video surveillance [2], [3] and even human medicine [4]. Due to its variety of application, even in critical environments, providers and users of the systems have a high demand for explainability and interpretability methods of these black box models. A vast number of different methods [5]–[13] have already been developed and allow deep learning-based image classifiers to give a visual explanation of their classification results. There are various ways for such explanations, including but not limited to feature visualization [14] and saliency maps [15] (see Figure 1). Feature visualization is used to give an insight view on what maximizes the activation in a specific neuron or channel of the network. A saliency map highlights crucial image areas in the input data for a given network output.

Contact IEEE to Subscribe

References

References is not available for this document.