Journals & Magazines >IEEE Transactions on Multimedia >Volume: 25

Causal Interventional Training for Image Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Deep learning models often fit undesired dataset bias in training. In this paper, we formulate the bias using causal inference, which helps us uncover the ever-elusive ca...Show More

Metadata

Abstract:

Deep learning models often fit undesired dataset bias in training. In this paper, we formulate the bias using causal inference, which helps us uncover the ever-elusive causalities among the key factors in training, and thus pursue the desired causal effect without the bias. We start from revisiting the process of building a visual recognition system, and then propose a structural causal model (SCM) for the key variables involved in dataset collection and recognition model: object, common sense, bias, context, and label prediction. Based on the SCM, one can observe that there are “good” and “bad” biases. Intuitively, in the image where a car is driving on a high way in a desert, the “good” bias denoting the common-sense context is the highway, and the “bad” bias accounting for the noisy context factor is the desert. We tackle this problem with a novel causal interventional training (CIT) approach, where we control the observed context in each object class. We offer theoretical justifications for CIT and validate it with extensive classification experiments on CIFAR-10, CIFAR-100 and ImageNet, e.g., surpassing the standard deep neural networks ResNet-34 and ResNet-50, respectively, by 0.95% and 0.70% accuracies on the ImageNet. Our code is open-sourced on the GitHub https://github.com/qinwei-hfut/CIT.

Published in: IEEE Transactions on Multimedia ( Volume: 25)

Page(s): 1033 - 1044

Date of Publication: 20 December 2021

ISSN Information:

DOI: 10.1109/TMM.2021.3136717

Funding Agency:

Contents

I. Introduction

Deep neural networks (DNNs) achieve the state-of-the-art performance in many tasks [1]–[5]. Since deep neural networks are driven by data, biased data inevitably cause biased models, resulting in poor generalization for test domains [6]. To confront with the bias, unbiased training is proposed to directly compensate the bias effect, e.g., jitter or flip the images for data augmentation [7], [8], batch normalization for stable mean and variance [9], neuron dropout for robust features [10], and re-weighting for balanced sample loss [11]–[13], just to name a few. Meanwhile, we do find that some of the bias types, such as visual contexts, are essentially good for different tasks [14], [15], e.g., an image with a highway definitely increases the probability of car or truck and decreases that of lion or fish. In fact, there are evidences showing that removing such “good” bias indeed hurts the model performance [16], as the “good” bias has a high probability to appear in test cases. However, how to distinguish the “good” from the “bad” at the training stage still remains open.

References is not available for this document.

Causal Interventional Training for Image Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Causal Interventional Training for Image Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References