1. Introduction
Increased explainability in machine learning is traditionally associated with lower performance, e.g. a decision tree is more explainable, but less accurate than a deep neural network. In this paper we argue that, in fact, increasing the explainability of a deep classifier can improve its generalization, especially to novel domains. End-to-end deep models often exploit biases unique to their training dataset which leads to poor generalization on novel datasets or environments. We develop a training strategy for deep neural network models that increases explainability, suffers no perceptible accuracy degradation on the training domain, and improves performance on unseen domains.