1. Introduction
Although deep neural networks (DNNs) are successful in many research areas, they are vulnerable to attacks [1]. A backdoor attack or Trojan is an important type of attack under which a DNN classifier will predict to the attacker’s target class when a test sample from one or more source classes is embedded with the attacker’s backdoor pattern [2]–[4]. A backdoor attack is typically launched by poisoning the classifier’s training set with samples originally from the source classes, embedded with the same backdoor pattern that will be used during inference, and labeled to the target class [5]. Since successful backdoor attacks do not degrade the classifier’s accuracy on clean test samples, they cannot be easily detected, e.g., using validation set accuracy [6].