1. Introduction
With the rapid development of remote sensing technology, lots of remote sensing images are available. The early research for the remote sensing scene classification task mainly concentrates on handcrafted features. Along with the great development of deep learning methods, recent research for remote sensing scene classification mainly focuses on deep features. Though the great success deep learning achieves, it is proved to be fragile when facing artificial perturbations on natural images (adversarial example). Adversarial training (AT) [1] and its derivative method, i.e., training with adversarial example, has been proved to be the most effective method to defense adversarial attacks. Yet it requires large amounts of labeled data, which is often difficult to acquire for remote sensing applications.