I. Introduction
Robustness in Deep Learning dictates that model classification results should be robust in the presence of bounded input perturbation. It is an important property because in real world applications, model inputs have all kinds of noise due to environmental condition variations. Model misbehaviors such as misclassification may be induced if an unrobust model is used. Many may have catastrophic consequences. For example, perception model misclassification (e.g., object detection model or depth estimation model) in autonomous driving vehicles may endanger human lives. There are many methods to improve model robustness and trustability of classification results even when the model is unrobust, such as adversarial input detection [6], [10], [21], [24], [36], model certification [16], model symbolic analysis [3], [9], [12], [19], [33], [37], and adversarial training [13], [18], [23], [26], [29], [30], [32], [38]. T Among them, adversarial training is one of the most popular methods. It leverages adversarial attack to generate input perturbations for given clean inputs. The perturbed inputs are called adversarial samples, which are used to train the model such that misclassifications can be prevented.