1. Introduction
Deep learning has achieved many recent advances in predictive modeling in various tasks, but the community has nonetheless become alarmed by the unintuitive generalization behaviors of neural networks, such as the capacity in memorizing label shuffled data [65] and the vulnerability towards adversarial examples [54], [21]