Loading [MathJax]/extensions/MathZoom.js
Transferring Information Between Neural Networks | IEEE Conference Publication | IEEE Xplore

Transferring Information Between Neural Networks


Abstract:

This paper investigates techniques to transfer information between deep neural networks. We demonstrate that a student network, which has access to information computed b...Show More

Abstract:

This paper investigates techniques to transfer information between deep neural networks. We demonstrate that a student network, which has access to information computed by a teacher network on the training data, learns faster, can be less deep and requires less labeled examples to achieve a given performance level. For that we force the student to mimic the teacher by adding a penalty term to the student's objective. We evaluate different penalty terms: (1) mean squared error between the cost gradients, (2) the Jacobian of the pre-softmax layer, (3) its row-summed version, (4) the cost gradient differences to standard double backpropagation and (5) a targeted double backpropagation via gradient derived masks. The Jacobian method improves the accuracy proportional to the difference in training examples, in contrast to the cost gradient. If the difference in accuracy between teacher and student is large enough, we find an improvement from the Jacobian information, even if both had seen the same training data. This indicates that information transfer has a regularization effect.
Date of Conference: 15-20 April 2018
Date Added to IEEE Xplore: 13 September 2018
ISBN Information:
Electronic ISSN: 2379-190X
Conference Location: Calgary, AB, Canada

1. Introduction

Due to increased computational capacities, availability of open source datasets and advancements in theoretical research, Deep Neural Networks (DNNs) currently achieve excellent performance in a wide range of applications, e.g., image classification [8] and quality assessment [3], natural language processing [4], genomics [16] or strategic game playing [19]. Though they perform well on their respective measures, DNNs suffer from a high computation cost during inference, as architectures may contain billions of trainable parameters [5], and from interpretability issues. This limits their usability on certain tasks, for example offline speech recognition on a mobile device, or transcriptomics, where one would like to know, which DNA motif led the protein to bind.

Contact IEEE to Subscribe

References

References is not available for this document.