Loading [MathJax]/extensions/MathZoom.js
Learning to Double-Check Model Prediction From a Causal Perspective | IEEE Journals & Magazine | IEEE Xplore

Learning to Double-Check Model Prediction From a Causal Perspective


Abstract:

The present machine learning schema typically uses a one-pass model inference (e.g., forward propagation) to make predictions in the testing phase. It is inherently diffe...Show More

Abstract:

The present machine learning schema typically uses a one-pass model inference (e.g., forward propagation) to make predictions in the testing phase. It is inherently different from human students who double-check the answer during examinations especially when the confidence is low. To bridge this gap, we propose a learning to double-check (L2D) framework, which formulates double check as a learnable procedure with two core operations: recognizing unreliable predictions and revising predictions. To judge the correctness of a prediction, we resort to counterfactual faithfulness in causal theory and design a contrastive faithfulness measure. In particular, L2D generates counterfactual features by imagining: “what would the sample features be if its label was the predicted class” and judges the prediction by the faithfulness of the counterfactual features. Furthermore, we design a simple and effective revision module to revise the original model prediction according to the faithfulness. We apply the L2D framework to three classification models and conduct experiments on two public datasets for image classification, validating the effectiveness of L2D in prediction correctness judgment and revision.
Page(s): 5054 - 5063
Date of Publication: 13 April 2023

ISSN Information:

PubMed ID: 37053061

Funding Agency:


I. Introduction

Machine learning models are widely used in various real-world applications, such as machine translation [1], image recognition [2], and recommender system [3]. In practice, the model is typically trained offline and deployed to serve the samples coming during the testing period. That is, the model indiscriminately makes predictions for all testing samples, while they can differ a lot. For instance, some samples [see Fig. 1(b)] can be hard to make confident predictions. Apparently, it differs from the behavior of human students in the testing period (e.g., an examination), who would double-check the answer for hard questions. Due to the lack of double check, the current models encounter sharp performance drops on low-confidence samples [4], [5].

Examples of (a) normal sample and (b) hard sample in the class of dog and the corresponding model predictions.

References

References is not available for this document.