I. Introduction
Despite their great success in many applications, modern deep learning models are vulnerable to adversarial attacks: small but well-designed perturbations can make the state-of-the-art models predict wrong labels with very high confidence [16], [30], [42]. The existence of such adversarial examples indicates unsatisfactory properties of the deep learning models’ decision boundary [20] and poses a threat to the reliability of safety-critical machine learning systems.