Journals & Magazines >IEEE Transactions on Geoscien... >Volume: 60

Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Scene classification is an active research topic in the remote sensing community, and complex spatial layouts with various types of objects bring huge challenges to class...Show More

Metadata

Abstract:

Scene classification is an active research topic in the remote sensing community, and complex spatial layouts with various types of objects bring huge challenges to classification. Convolutional neural network (CNN)-based methods attempt to explore the global features by gradually expanding the receptive field, while long-range contextual information is ignored. Vision transformer (ViT) can extract contextual features, but the learning ability of local information is limited, and it has a large computational complexity simultaneously. In this article, an end-to-end method is exploited by employing ViT as an excellent teacher for guiding small networks (ET-GSNet) in the remote sensing image scene classification. In the ET-GSNet, ResNet18 is selected as the student model, which integrates the superiorities of the two models via knowledge distillation (KD), and the computational complexity does not increase. In the KD process, the ViT and ResNet18 are optimized together without independent pretraining, and the learning rate of teacher model gradually decreases until zero, while the weight coefficient of the KD loss module is doubled. Based on the above procedures, dark knowledge from the teacher model can be transferred to the student model more smoothly. Experimental results on the four public remote sensing datasets demonstrate that the proposed ET-GSNet method possesses the superior classification performance compared to some state-of-the-art (SOTA) methods. In addition, we evaluate the ET-GSNet on a fine-grained ship recognition dataset, and the results show that our method has good generalization for different tasks in terms of some metrics.

Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 60)

Article Sequence Number: 5618715

Date of Publication: 17 February 2022

ISSN Information:

DOI: 10.1109/TGRS.2022.3152566

Funding Agency:

Contents

I. Introduction

In recent years, Earth observation (EO) technology has been comprehensively developed, and human beings can be cognitive to the state of the Earth’s surface through various modes of remote sensing data [1]–[4]. High spatial resolution (HSR) image is an important and widely used remote sensing data, which can clearly present ground targets and spatial patterns [5]–[7]. Based on HSR images, the prosperity of some real-world applications have been promoted, such as object detection [8], land planning [9], intelligent agriculture [10], traffic analysis [11], and disaster assessment [12].

References is not available for this document.

Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?