Loading [MathJax]/extensions/MathMenu.js
Transfer Adaptation Learning for Target Recognition in SAR Images: A Survey | IEEE Journals & Magazine | IEEE Xplore

Transfer Adaptation Learning for Target Recognition in SAR Images: A Survey


Abstract:

Synthetic aperture radar (SAR) target recognition is a fundamental task in SAR image interpretation, which has made tremendous progress with the advancement of artificial...Show More

Abstract:

Synthetic aperture radar (SAR) target recognition is a fundamental task in SAR image interpretation, which has made tremendous progress with the advancement of artificial intelligence technology. However, SAR imaging is sensitive to the operating conditions of platforms, resulting in large distribution discrepancy for the data collected on different platforms. Moreover, SAR target images are difficult to annotate due to the blurry textures, resulting in insufficient labeled data to train a model. Therefore, subject to the data distribution discrepancy and insufficient labeled data, SAR target recognition becomes a highly challenging task. Transfer adaptive learning (TAL) is a learning paradigm aimed at completing target tasks by transferring knowledge from relevant source domains, which is a promising technique for solving the aforementioned problems in SAR target recognition. However, there is currently no comprehensive survey about the application of transfer adaptation learning in SAR target recognition. To this end, we comprehensively summarized the development of transfer adaptive learning in SAR target recognition, and provided systematic guidance for future research. In this article, we first summarized the electromagnetic features and visual features of SAR images used for target recognition, which can be potentially used for knowledge transfer. Then, we systematically reviewed the related literature according to the homogeneity of the transfer domains, the modality of the data in the source domain, and the category of the TAL methods. The available datasets that can be used to validate the TAL methods for SAR target recognition were also summarized for the researcher's convenience. We also conducted comparative experiments on these data to demonstrate the performance of TAL methods. Finally, we analyzed the main challenges of the current methods and pointed out several directions worth studying in the future.
Page(s): 13577 - 13601
Date of Publication: 26 July 2024

ISSN Information:

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

In Recent years, with the development of machine learning algorithms, the visual understanding of synthetic aperture radar (SAR) images has made tremendous progress [1], [2], [3]. Target recognition, which is a fundamental problem and holds a pivotal position in the visual understanding of SAR images, aims to recognize the fine-grained categories or type information of the target accurately [4], [5]. As shown in Fig. 1, the SAR images always contain the speckle noise compared with the optical images due to electromagnetic wave imaging [6]. It is noteworthy that, given the reliance of SAR sensors on electromagnetic wave imaging, they remain unaffected by climatic variations and lighting conditions. Their extensive detection range and ability to penetrate vegetation further reinforce their advantages compared to optical target recognition [7], [8]. As a cutting-edge image interpretation method, SAR target recognition finds crucial applications in both military and civilian domains, ranging from disaster reporting and prevention to traffic management, urban planning, military intelligence gathering, etc., [9], [10]. SAR imaging is notably sensitive to the working conditions of platforms and the states of the target, including factors such as depression angle, polarization mode, wave band, and target azimuth [11], [12], [13]. Variations in these parameters significantly affect aspects of SAR images such as brightness, texture, scatter distribution, and noise levels. Therefore, in the task of SAR target recognition, if the training data and test data are collected from different platforms or under different working conditions, there will be significant differences in the data distributions between the training data and test data. This leads to the violation of the assumption made in traditional machine learning, which is that the samples in the training and test sample spaces are independent and identically distributed (I.I.D.) [14], [15]. For example, the images needed to be identified are collected after the working mode of radar has changed or from other platforms [16]. Traditional machine learning algorithms perform badly under these test conditions, as their generalization is poor.

Fig. 1. - Difference between optical target recognition and SAR target recognition.
Fig. 1.

Difference between optical target recognition and SAR target recognition.

One solution is to obtain and annotate data with the same distribution as the test data, and then retrain the model with supervised learning. However, it is time consuming and resource consuming. Whenever new test data are collected from other platforms or working conditions, the entire process needs to be repeated, leading to unacceptable time and resource consumption. Meanwhile, annotating the SAR target images is extremely difficult for several reasons.

  1. SAR imaging relies on a distinct mechanism that involves active emission and reception of electromagnetic waves, which encompasses intricate physical processes such as electromagnetic scattering, interference, and diffraction. Consequently, SAR images exhibit different textures compared to optical natural images, deviating from human visual perception.

  2. SAR imaging is prone to various factors, including speckle noise, geometric deformation, and structural loss, leading to diverse visual representations of the same target type [7].

  3. Moreover, SAR images of certain target models display a high degree of visual similarity, making them nearly indistinguishable to the human eye, even for experts with specialized knowledge in annotation.

A range of machine learning techniques aimed at improving the performance of SAR target recognition models but applied to different scenarios have also been employed, including metalearning techniques [17], [18], knowledge distillation [19], semisupervised learning [20], and unsupervised learning [21], etc. Specifically, metalearning techniques aim to uncover the inherent patterns and rules within the data itself, addressing the challenge of SAR target recognition with small sample sizes [17], [18]. On the other hand, knowledge distillation focuses on transferring the knowledge of larger, more complex models (teacher models) to smaller, more efficient models (student models), thereby enhancing the performance of SAR target recognition models with limited resources [19]. Semisupervised and unsupervised learning techniques aim to tackle SAR target recognition problems in various scenarios with varying degrees of labeled data availability. They rely on unlabeled or partially labeled data to learn meaningful representations and patterns, which can then be leveraged for recognition tasks [20], [21].

An effective way to solve the mentioned problem is to extract knowledge from other relevant data and apply it to SAR target recognition, which is called transfer learning or domain adaptation learning, and we use a general name transfer adaptation learning (TAL) for unifying them [22]. Transfer adaptation learning strives to acquire pertinent knowledge from source domain data that is associated with the target domain yet exhibits distinct distributions. The ultimate goal is to develop a model that excels at performing target domain tasks. A crucial aspect to note is that the target domain faces difficulties in directly crafting an effective model owing to the scarcity of annotated data, and so on. In contrast, obtaining annotated data in the source domain is comparatively easier, thereby bolstering the target domain task with pertinent knowledge. TAL imitates the learning ability of humans to draw analogies and to apply knowledge learned from other data. Due to these characteristics of TAL, it has widespread applications in natural language processing (NLP), computer vision (CV), speech signal recognition, fault diagnosis, and beyond [22]. In CV, TAL can be used for detection, recognition, and segmentation tasks under various cross-imaging conditions, across different weather or lighting conditions, which are crucial for applications in autonomous driving, surveillance, and security [23]. TAL can also be applied to detection, recognition, and segmentation across different imaging modalities, such as pedestrian reidentification from optical to infrared, and medical image analysis from CT to MRI [24], [25], which are the key tasks in tracking, surveillance, and medical image analysis. TAL technology also has been widely utilized in the field of remote sensing [26], [27], [28], [29], including SAR target recognition [30], [31], [32], [33]. First, it leverages pretraining on extensive data closely related to the target domain, enabling the model to acquire and refine general features specific to SAR targets. Second, it minimizes the distribution discrepancy metrics between source and target domain SAR data by employing distribution alignment techniques. Third, it employs adversarial learning methods and other strategies to create shared feature spaces, thereby blurring the distinction between source and target domains. In the subsequent sections, we will delve deeper into the current TAL methodology tailored specifically for SAR target recognition.

To our knowledge, research on SAR target recognition based on TAL began in 2018 [34]. With the development of artificial intelligence technology, this research field is currently receiving increasing attention [35]. Up to now, there have been some surveys on the TAL [22], [36], [37], SAR target recognition [35], [38], [39], [40], or TAL in other fields [41]. As we can see the details of these surveys in Table I, they are not concerned on the SAR target recognition based on TAL. As far as we know, our paper is the first survey specialized for the SAR target recognition based on TAL. Over 70 excellent methods were systematically reviewed in our paper. We hope it can be a valuable tool and guidance to researchers interested in this field, assisting them in mastering the development direction.

TABLE I Summary of Related Surveys
Table I- Summary of Related Surveys

For the SAR target recognition task, there are various types of data that can be taken as source domains to provide transferable knowledge, and each of them has its own characteristics, such as SAR images [30], simulated SAR images [31], optical images [32], and AIS [33], which can be seen in Fig. 2. Therefore, even the general methods have specific settings for different types of data, much less the methods that are designed for the special type of data. Hence, one can see that the literature categorization only based on the categories of methods is not suitable for SAR target recognition based on TAL. At the same time, the acquisition principles of both SAR images and simulated SAR images are the same as those of the target domain, indicating that they are homogeneous data. Conversely, the acquisition principles of optical images and AIS data are significantly different from those of the target domain, indicating that they are heterogeneous data. Therefore, in our article, we first divide the literature into two top categories (homogeneous TAL and heterogeneous TAL) based on whether the modal of source data is the same as the target data, then divide the literature into four middle categories (from SAR to SAR, from simulated-SAR to SAR, from optical to SAR, from AIS to SAR) based on the types of data in the source domain, and finally divide them into bottom categories (pretraining, distribution-alignment, shared-feature-space) according to the categories of TAL methods. The detailed category information can be seen in Table II. Apart from this, the knowledge transferred and datasets used in these methods are also presented in columns 3 and 4 of this table. We believe that the multilevel literature categorization enables the readers to quickly and clearly find the relevant content they are interested in.

TABLE II Summary of Reviewed Literature
Table II- Summary of Reviewed Literature
Fig. 2. - Data types that can provide knowledge for SAR target recognition.
Fig. 2.

Data types that can provide knowledge for SAR target recognition.

Contributions: Our article contributes to TAL in SAR target recognition by reviewing almost all relevant literature, analyzing the transferable features of SAR images, providing a detailed literature categorization, representing detailed information about available datasets, and offering suggestions and recommendations for future research. The key contributions are as follows.

  1. We investigated the effective electromagnetic and visual features for SAR target recognition and analyzed their potential applications for TAL in SAR target recognition, aiming to provide researchers with a comprehensive understanding of what kinds of knowledge can be transferred for SAR target recognition.

  2. We have studied over 70 methods that apply TAL to SAR target recognition, and categorized these papers systematically from top to bottom based on the homogeneity of the transfer domains, the modality of the data in the source domain, and the category of the TAL methods. Therefore, this article fills the gap in the current literature, which lacks a survey on the application of TAL in SAR target recognition.

  3. We have investigated the datasets used in the reviewed literature, which could provide transferable knowledge for SAR target recognition. The detailed information and properties of these datasets are summarized. The experiments are also conducted on them to demonstrate the superiority of the TAL methods tailored specifically for SAR target recognition.

  4. We have thoroughly analyzed the performance limitations of current methods and outlined potential directions for improvement. In addition, we have delved into unexplored scenarios in TAL for SAR target recognition, offering insights into promising future research directions.

Organization: The rest of this article is organized as follows. Section II introduces the preliminary knowledge of TAL and the categories of TAL methodology. Section III discusses transferable features of SAR images. Section IV reviews the homogeneous TAL methodologies, including transferring knowledge from SAR to SAR, from simulated-SAR to SAR. Section V reviews the heterogeneous TAL methodologies, including transferring knowledge from optical to SAR, from AIS to SAR, and from AIS and optical to SAR. Section VI gives the information of available data providing knowledge for SAR target recognition. Section VII provides comparative experiments to demonstrate the performance of TAL methods. Section VIII presents the discussions for existing methods and the recommendation for future work. Finally, Section IX draws the conclusions.

SECTION II.

Preliminaries

A. Notations and Problem Definition

1) Notations

A summary of notations used in this article is given as follows to simplify understanding for readers.

$ D_{S}$:

Source Domain

$ D_{T}$:

Target Domain

$ d_{S}$:

Feature size of samples in source domain

$ d_{T}$:

Feature size of samples in target domain

$ n_{S}$:

Number of samples in source domain

$ n_{T}$:

Number of samples in target domain

$ n_{TL}$:

Number of labeled samples in target domain

$ n_{TU}$:

Number of unlabeled samples in target domain

$ x_{S} \in \mathbb {R}^{d_{S}}$:

Instance in source domain

$ \mathcal {X}_{S}$:

Feature space of instance in source domain

$ y_{S}$:

Label of instance in source domain

$ \mathcal {Y}_{S}$:

Label space of instance in source domain

$ x_{T} \in \mathbb {R}^{d_{T}}$:

Instance in target domain

$ x_{TL} \in \mathbb {R}^{d_{T}}$:

Labeled instance in target domain

$ x_{TU} \in \mathbb {R}^{d_{T}}$:

Unlabeled instance in target domain

$ \mathcal {X}_{T}$:

Feature space of instance in target domain

$ y_{T}$:

Label of labeled instance in target domain

$ \mathcal {Y}_{T}$:

Label space of instance in target domain

2) Problem Definition

According to the definition in the previous work [37], [120], [121], the domain consists of feature space $ \mathcal {X}$ and marginal probability distribution $ P(x)$, where $ x \in \mathcal {X}$ and follows $ P(x)$. The task $ \mathcal {T}$ based on the domain consists of the label space $ \mathcal {Y}$ and conditional probability distribution $ P(y|x)$, where $ y \in \mathcal {Y}$ and follows $ P(y|x)$. Therefore, the data in source domain are denoted as $ D_{S} = \lbrace (x_{S}^{i}, y_{S}^{i})\rbrace _{i=1}^{n_{S}}$ where $ x_{S} \in \mathcal {X}_{S}$, and $ y_{S} \in \mathcal {Y}_{S}$. Different from the source domain, the data in target domain consists of two parts, $ D_{T} = D_{TL} \bigcup D_{TU}$, which are respectively denoted as $ D_{TL} = \lbrace (x_{TL}^{i}, y_{TL}^{i})\rbrace _{i=1}^{n_{TL}}$ and $ D_{TU} = \lbrace x_{TU}^{i}\rbrace _{i=1}^{n_{TU}}$, where $ x_{TL}, x_{TU} \in \mathcal {X}_{T}$, and $ y_{T} \in \mathcal {Y}_{T}$. Under normal conditions, $ 0 < n_{TL} \leq n_{TU} \ll n_{S}$.

Due to the discrepancy between source domain $ D_{S}$ and target domain $ D_{T}$, which is caused by different work conditions, platforms, depression angles, polarization mode, wave band, azimuth, and so on, in SAR field, the marginal probability distribution and the conditional probability distribution of source and target domain are different, $ P_{S}(x) \ne P_{T}(x)$ and $ P_{S}(y|x) \ne P_{T}(y|x)$. That is also the root reason why the methods designed for the source domain cannot generalize well on the target domain. TAL aims to solve this problem by transferring knowledge from the task $ \mathcal {T}_{S}$ on the source domain $ D_{S}$ to the task $ \mathcal {T}_{T}$ on the target domain $ D_{T}$. Specifically, given the data of source domain $ D_{S} = \lbrace (x_{S}^{i}, y_{S}^{i})\rbrace _{i=1}^{n_{S}}$, the task $ \mathcal {T}_{S}$, and the data of target domain $ D_{T} = \lbrace (x_{TL}^{i}, y_{TL}^{i})\rbrace _{i=1}^{n_{TL}} \bigcup \lbrace x_{TU}^{i}\rbrace _{i=1}^{n_{TU}}$, that is to say, given the marginal probability distributions $ P_{S}(x)$ and $ P_{T}(x)$, and the conditional probability distribution $ P_{S}(y|x)$ and $ P_{TL}(y|x)$, the goal of TAL is learning the objective conditional probability distribution $ P_{TU}(y|x)$. There are some special circumstances: when $ n_{TL} > 0$, it is always tagged with semisupervised learning; when $ n_{TL} = 0$, it is tagged with unsupervised learning.

B. Categories of TAL Methodology

Regardless of the modality of data used as the source domain for knowledge transfer, the literature reviewed in this article can be divided into three categories based on the techniques used, namely: Pretraining, distribution alignment, and shared feature space. In this section, we introduce the general forms of these methods, and the methods designed specifically based on the data characteristics will be reviewed in detail in the following sections.

1) Pretraining

Datasets with large volumes contain rich and diverse data, which can provide the general knowledge for target recognition or other visualization task. Pretraining the model on these datasets enables the model to extract general features, and then the model can be customized for the downstream task by fine-tuning operation, which refer to the task of training a model in specific application scenarios, such as target recognition, detection, or segmentation [34]. The paradigm of pretraining and fine-tuning can be seen in Fig. 3. This method is simple to implement but relies on the dataset with a large amount of labeled data.

Fig. 3. - Paradigm of pretraining.
Fig. 3.

Paradigm of pretraining.

2) Distribution Alignment

Due to the difference between imaging environments, platforms, lighting conditions, and so on, the source training data and the target test data show different probability distributions, which can be seen in Fig. 4. The manifestation of distribution differences in image data is that the targets in images have different appearances [56]. Distribution alignment methods decrease the distribution differences by minimizing the distribution discrepancy metrics of the features from source and target, such as MMD [56], LMMD [122], KLD [110], and the paradigm of the distribution alignment can be seen in Fig. 5. This method is effective but relies on predefined distribution discrepancy metrics.

Fig. 4. - Distribution discrepancy between domains.
Fig. 4.

Distribution discrepancy between domains.

Fig. 5. - Paradigm of distribution alignment.
Fig. 5.

Paradigm of distribution alignment.

3) Shared Feature Space

Different from the pretraining methods relying on sufficient data and the distribution alignment methods relying on the distribution discrepancy metrics, building a feature space where the features come from different domains cannot be discriminated against is also an effective paradigm, which is called the shared feature space method. The diagram of the commonly used method to build shared feature space, adversarial learning, which endeavors to confuse the feature representations of data from distinct domains through a competitive game between feature extractors and domain discriminators, can be seen in Fig. 6. The shared feature space method confuses the features from source and target domains so that the classifier can perform well regardless of the domains.

Fig. 6. - Paradigm of adversarial learning.
Fig. 6.

Paradigm of adversarial learning.

SECTION III.

Transferable Features of SAR Images

Relatively speaking, the deep visual features of SAR images have been extensively studied in TAL for SAR target recognition, as outlined in the reviewed methods. However, the special property features of SAR images, such as electromagnetic scattering features and traditional features, have not been fully utilized in the TAL for SAR target recognition [54], [79]. During the review process, it was found that some work had fused these features with deep features in SAR recognition tasks for TAL to improve performance [60], [79], [84], [117]. This indicates that this approach is able to improve the effectiveness of TAL by introducing additional information. Consequently, they are highly valuable considerations for future endeavors and can be effectively utilized in TAL for SAR target recognition, thereby improving both model performance and generalization capabilities. Therefore, they are the potential sources of knowledge transfer in SAR target recognition. In this section, we discuss the traditional features that may be useful for knowledge transfer in SAR target recognition, such as the electromagnetic scattering features extracted based on the attribute scattering center (ASC) model [123], the visual features such as SAR-SIFT [124], histogram of oriented gradient (HOG) [125], wavelet transformation (WT) [126], sparse representation (SR) [127], Naive geometric features (NGFs) [128]. Electromagnetic features stem directly from the principles of SAR imaging. They predominantly capture intrinsic physical attributes related to the target's surface material composition, such as scattering patterns, reflectivity, and polarization characteristics. These features are inherently linked to the electromagnetic interactions between the radar wave and the target, providing insights into the target's material composition and geometric shape. In contrast, visual features are extracted directly from the SAR images. They encompass attributes related to the image's texture, pixel statistics, pixel variation trends, target size, and other transformations of the image data. Visual features capture the visual appearance and spatial relationships of objects within the SAR image. Both categories of features possess their unique value in SAR target recognition, offering complementary information that can be integrated to optimize the performance of recognition models. In contrast, the deep features suffered the question about network structure modifications, optimization tricks, loss function designing, and so on, bringing challenges for further performance improvements. Therefore, the traditional features should not be abandoned [129].

A. Attribute Scattering Center

The ASC model was first proposed by Gerry et al. [123], which extended the physical optical theory and geometrical theory of diffraction [130]. It denotes that, under high-frequency conditions, the electromagnetic scattering is approximate to the superposition of a series of individual electromagnetic scattering centers [5]. ASC model can be described by a set of functions of operation frequency and aspect angle of radar, which is formulated as \begin{equation*} E\left({f,\phi; \Theta } \right) = \sum \limits _{i = 1}^{n} {{E_{i}}\left({f,\phi; {\Theta _{i}}} \right)} \tag{1} \end{equation*} View SourceRight-click on figure for MathML and additional features.where $ E({f,\phi; \Theta })$ denotes the total electromagnetic scattering, $ f,\phi$ denotes the operation frequency and aspect angle of radar, $ \Theta$ denotes the physical parameters, and $ E_{i}({f,\phi; {\Theta _{i}}})$ denotes the $ i\text{th}$ electromagnetic scattering center. In details, the $ i\text{th}$ electromagnetic scattering center can be described as follows: \begin{align*} {E_{i}}\left({f,\phi; {\Theta _{i}}} \right) = & {A_{i}} \cdot {\left({j\frac{f}{{{f_{c}}}}} \right)^{{\alpha _{i}}}} \\ &\cdot \exp \left({\frac{{ - j4\pi f}}{c}\left({{x_{i}}\cos \phi + {y_{i}}\sin \phi } \right)} \right) \\ &\cdot \sin c\left({\frac{{2\pi f}}{c}{L_{i}}\sin \left({\phi - {{\bar{\phi } }_{i}}} \right)} \right) \\ &\cdot \exp \left({ - 2\pi f{\gamma _{i}}\sin \phi } \right) \tag{2} \end{align*} View SourceRight-click on figure for MathML and additional features.where $ c$ denotes the speed of electromagnetic wave, $ f_{c}$ denotes the radar central frequency, $ {\Theta _{i}} = [ {{A_{i}},{x_{i}},{y_{i}},{\alpha _{i}},{L_{i}},{{\bar{\phi } }_{i}},{\gamma _{i}}} ]$ is the physical parameters of the $ i\text{th}$ electromagnetic scattering center, in which $ A_{i}$ represents the amplitude of it, $ (x_{i}, y_{i})$ represents the geometrical location, $ \alpha _{i}$ denotes the frequency dependence, $ L_{i}$ denotes the length, $ {{\bar{\phi } }_{i}}$ denotes the azimuth, and $ {\gamma _{i}}$ denotes the azimuth dependence. The ASC model contains intrinsic property information of the radar target, including electromagnetic diffraction resulting from sharp or protruding parts of the target. This diffraction reflects the physical structure of the target and facilitates target recognition, and the information of the most common scattering structures can be seen in Table III. Based on the ASC model and optimization algorithms, the parameters representing the property information of the radar target can be estimated. Then, electromagnetic features can be obtained through feature extraction methods based on the estimated parameters, and widely utilized in SAR target recognition [5], [131], [132], [133]. Some researchers have also applied the information directly related to the radar target itself, obtained based on the ASC model, to the TAL task of SAR target recognition. They use feature extraction methods to obtain electromagnetic scattering features and fuse them with the visual features of SAR images to enhance the generalization of the model [54], [79], or integrate this information as prior knowledge into the training process of the model to improve its recognition performance [84]. It can be seen that, by integrating SAR target electromagnetic scattering knowledge with the recognition model, the generalization of the model can be effectively improved. Therefore, how to transfer electromagnetic scattering knowledge based on the ASC model is a question worth exploring.

TABLE III Detailed Information of Geometric Scattering Types
Table III- Detailed Information of Geometric Scattering Types

B. SAR-SIFT

SAR-SIFT is one of the most common keypoints detection and scale-invariant feature extraction algorithms that are used for the SAR image interpretation [124]. SAR SIFT can avoid the influence of speckle noise to a considerable extent, extracting scale-invariant, rotation-invariant, and translation-invariant features. It obtains keypoints and descriptors that encapsulate gradient information from surrounding pixels, with the ultimate goal of reflecting edge information through gradient computation. SAR-SIFT is also used to build features for SAR target detection [134] and recognition [135], [136]. Therefore, the SAR-SIFT features are promising for TAL in SAR target recognition.

C. SAR-HOG

The HOG feature was first proposed for person detection, which is not sensitive to local geometric transformations, such as translations or rotations [125]. Considering that SAR images are sensitive to the depression angle and the azimuths of targets, Song et al. [137] proposed SAR-HOG, which pays more attention to the stable SAR image pixels that are reflected by strong backscatter returns from structures exhibiting aspect insensitivity. After that, the HOG feature is widely used for SAR image interpretation [138], [139], including SAR target recognition [129], [140], [141]. Therefore HOG features have the potential to be fused with deep features for TAL to improve the SAR target recognition performance.

D. Wavelet Transformation

WT is developed based on the Fourier transformation. Fourier transformation is invariant for translation, but variant for deformation of the high frequency parts [126]. Generally speaking, image features ought to exhibit invariance to geometric transformations, encompassing translation, rotation, scale transformation, and minor deformations, while also possessing robustness against perturbations. The WT features possess these properties but are sensitive to translation. Therefore, some researchers have improved it and applied it to SAR target recognition [142] or combined deep feature [126], [143], [144]. WT features have shown good performance in SAR target recognition tasks and can be further fused with deep features. It is also worth considerable for TAL in this field.

E. Sparse Representation

SR is widely used for face recognition [145], image denoising [146], and SAR target recognition [103], [147], [148]. The basic idea of the SR classification is to represent the test sample with a linear combination of training samples. The category of the test sample can be determined by the sparse reconstruction error [148]. The SR has good anti-noise performance, and is widely used for SAR target recognition [149], [150], [151] or combined deep features [60]. SR is complementary with deep features in SAR target recognition to some extent, and it should be considered in TAL for SAR target recognition [60].

F. Naive Geometric Features

NGFs are used for ship classification in SAR images [128], [152], which represent the geometric properties of the ships, such as length, width, perimeter, and area ratio. The detailed information about NGFs can be seen in Table IV. For SAR target images, these features can be obtained by calculating the minimum bounding boxes surrounding the target [33], [116]. The experiments [128], [152] in SAR images show that the NGFs are effective features for ship fine-grained recognition. It can be seen that these features are common in some different types of data. Therefore, they are also worth considering for TAL in SAR target recognition [128], [152].

TABLE IV Naive Geometric Features
Table IV- Naive Geometric Features

In this section, we discussed the traditional features used in the reviewed literature and presented several potential features that could be applied in TAL for SAR target recognition. However, the manners of utilizing these features remain worth exploring in this field. The features that can be used in TAL for SAR target recognition are not limited to these, and we should explore more potential features in the future and consider how to utilize them effectively.

SECTION IV.

Homogeneous Transfer Adaptation Learning

SAR and simulated-SAR images are two kinds of homogeneous data that can provide knowledge for SAR target recognition. Without a doubt, the source domain SAR images and target domain SAR images are absolutely homogeneous, with only existing discrepancies between operation conditions or platforms. Differently, simulated-SAR images are obtained by a simulation system based on electromagnetic theory and target modeling, which is similar to the process of SAR imaging [31]. Therefore, we refer to the methods using homogeneous data in both the source and target domains as homogeneous TAL. The simulated-SAR images can be more easily obtained than SAR images, but SAR images are more in line with real scenarios. Therefore, how to transfer knowledge from these data to SAR target recognition has become a hot topic.

A. From SAR to SAR

Transferring knowledge from SAR images to SAR images can be the most direct method in homogeneous TAL for SAR target recognition. There are two kinds of SAR images, scene images and target images. The quantity of the SAR scene images is always larger than the SAR target images. However, annotating scenes or target images is time-consuming and source-consuming. How to transfer the imaging modal knowledge (information of SAR images, regardless of contents) from SAR scene images or target related knowledge from SAR target images is a question worth of exploring, and many kinds of methods are proposed to solve this [14], [15], [16], [30], [43], [45], [46], [47], [48], [49], [51], [52], [53], [54], [55], [56], [57], [58], [59], [61], [63], [64], [153].

1) Pretraining Method

Pretraining and fine-tuning is a commonly used learning paradigm in this field. Based on the ability of the pretrained model to extract common features, the model can be easily adapted to downstream tasks, such as SAR target recognition [30], [43], [45], [46], [47], [48], [49], [51], [52], [53], [54], [55], [153].

The performance of directly utilizing the pretrained model is often unsatisfactory. Therefore, many technologies are combined to improve the performance of the SAR target recognition model. Considering that there is no well-annotated SAR target data can be used, Huang et al. [30] proposed to transfer knowledge from unlabeled SAR scene data with large volumes. A convolutional network was pretrained on the image reconstruction task to extract general features, and then fine-tuned on the SAR target recognition task. Zhang et al. [153] pretrained a generative adversarial network (GAN) on unlabeled SAR scene images as the initial model for the SAR target recognition task. Inspired by this, Zhang et al. [46] pretrained a combination of GAN and reconstruction network on the unlabeled SAR images to improve performance. Compared with pretraining on the unlabeled SAR scene images, pretraining on the SAR target images can be more direct for the recognition task. Zhang et al. [43] pretrained the network on the SAR vehicle images and then fine-tuned it on the SAR ship images, which transferred knowledge across tasks. Zhang et al. [49] considered that the number of ships in the SAR ship detection dataset is much larger than the SAR ship recognition dataset. They pretrained the network on the SAR ship detection dataset to extract the general features of ships and then fine-tuned it on the ship recognition dataset to achieve ship target recognition.

To improve the performance of pretraining methods, some researchers adapted data augmentation to enrich the diversity of data. Zhai et al. [47] employed geometric image transformation to generate new SAR vehicle images to enlarge the volume of the pretraining dataset. Liu et al. [53] adapted style transfer technology to extend MSTAR [42], then pretrained and fine-tuned the network for ship recognition. Pei et al. [55] combined the geometric transformation augmentation and contrastive learning to enable the model to extract discriminative features and then fine-tuned it for ship recognition.

Some well-designed modules are plugged into the network to improve the performance. Shang et al. [45] added an information recorder module and improved the generalization of the network with a hinge loss function. Zhai et al. [48] designed a multilevel feature fusion attention module to improve the discriminative of features. Wang et al. [51] pretrained a Siamese network to iteratively predict pseudolabel for unlabeled target images, and fine-tuned the network based on them.

To utilize the SAR imaging modality characteristics, Liu et al. [54] proposed a complex convolutional network with electromagnetic properties transfer. It calculated new convolutional kernels based on modified ASC model, so that obtained better discriminative features through pretraining.

Varieties of pretraining methods have been proven to be effective in the homogeneous TAL for SAR target recognition, but there are some questions worth considering:

  1. The volume of the SAR dataset used for pretraining is too small compared with the common dataset in computer vision, for example, ImageNet [85].

  2. These methods ignored the essential discrepancy between probability distributions of the source domain and target domain.

2) Distribution Alignment Method

It is assumed that the training and test data come from the same distribution in traditional machine learning. However, the data distribution discrepancy caused by imaging conditions of SAR platforms has broken this assumption, resulting in poor generalization of the model trained in the source domain. Therefore, aligning distributions is a kind of effective method to improve the generalization of the network for SAR target recognition [56], [57], [58], [59].

Huang et al. [56] first and comprehensively explored the difference between networks, source tasks, network layers, and TAL methods for SAR target recognition. The multikernel maximum mean discrepancy (MK-MMD) was adopted to measure the discrepancy between the marginal distributions of source and target data, and transitive TAL was proposed to deliver knowledge across multisource (optical image $ \rightarrow$ SAR scene image $ \rightarrow$ SAR target image). The MK-MMD and whole loss function can be formulated as \begin{align*} & \text{MMD}(x_{S},x_{T})=\left[ {E_{S}\left[ {\phi \left({{x_{S}}} \right)} \right] - E_{T}\left[ {\phi \left({{x_{T}}} \right)} \right]} \right]_\mathcal {H}\tag{3a}\\ &\mathcal {L} = \mathcal {L}_{C}(x_{TL}, y_{TL}) + \lambda \sum \limits _{l=k+1}^{L} {{\alpha _{l}}\text{MMD}_{l}({x_{S}},{x_{T}})} \tag{3b} \end{align*} View SourceRight-click on figure for MathML and additional features.where $ \phi$ denotes one of the functions in the unit of reproducing kernel Hilbert space (RKHS) $ \mathcal {H}$, $ E[ \cdot ]$ denotes the mathematical expectation, $ \lambda$ denotes a tradeoff parameter, and $ \alpha _{l}$ denotes the weights of MMD loss for layer $ l$. Their experiments proved that the images containing the targets of similar categories are more suitable for transferring knowledge, the shallow layers are easier to transfer, and the distribution alignment method is more effective than pretraining methods. Considering that directly performing distribution alignment will cause the domain special knowledge to be lost. Zhang et al. [58] designed a shared subnetwork and special subnetwork to deal with the domain common and private properties, and then performed MK-MMD to align the distributions. The unlabeled target domain data contain important information too, and the most direct method to make use of it is the pseudolabel strategy (training model based on the unlabeled data with predicted label). Chen et al. [59] performed weak image augmentation for pseudolabel predicting and strong image augmentation for performance improvement. They also aligned the distributions with MK-MMD, and decreased the negative impact of the incorrect pseudolabel by top-k loss [154]. However, the discrepancy between different categories is not explored in these method designs. Zhao et al. [57] predicted the pseudolabel and calculated class confusion loss [155] to align the conditional distributions for each category, which is formulated as \begin{equation*} {{\mathcal L}_{CC}} = \frac{1}{c}\sum \limits _{i = 1}^{c} {\sum \limits _{j \ne i}^{c} {\left| {\frac{{{C_{ij}}}}{{\sum \limits _{k = 1}^{c} {{C_{ik}}} }}} \right|} } \tag{4} \end{equation*} View SourceRight-click on figure for MathML and additional features.where $ c$ denotes the number of class, $ C_{ij}$ denotes the weighted class correlation between class $ i$ and class $ j$. In addition, Zhang et al. [60] conducted a preliminary exploration of online domain adaptation. They achieved good performance by aligning the distribution of target data with maximum a posteriori estimation, using the combination of deep features with SR features.

Although distribution alignment methods have achieved good performance, there are still some details that lack exploration.

  1. The metrics used to measure discrepancy between probability distributions often overlook the conditional distribution and covariance. Incorporating marginal and conditional probability distributions, as well as covariance, could lead to a more accurate alignment.

  2. The current distribution alignment methods fail to leverage properties specific to SAR targets, such as electromagnetic scattering characteristics. By utilizing these properties, the alignment process and performance could be potentially enhanced.

3) Shared Feature Space Method

The shared feature space methods are mainly concentrated on building a feature space, where the features come from different domains that cannot be discriminated against or where the target characteristics have a general representation regardless of domains. Many interesting methods have emerged to achieve this [14], [15], [16], [61], [63], [64].

The adversarial learning strategy is always used to build the shared feature space. Zhao et al. [14] confused the features of data from different platforms by adversarial learning to decrease the domain gap. Considering that learning from the hard samples is key to improving performance, Zhao et al. [16] proposed a dynamic hard sample selection method to increase the importance of hard samples in the adversarial learning process. To utilize the information from unlabeled target data, Zhao et al. [15] combined the pseudolabel strategy and adversarial learning, and they corrected the pseudolabel with a class confusion matrix. The whole loss function can be formulated as \begin{align*} {{\mathcal L}_{\text{adv}}} =& {E_{S}}\left[ \log ({{\mathcal F}_{D}^{S}\left({{x_{S}}} \right)}) \right] + {E_{T}}\left[ \log (1- {{\mathcal F}_{D}^{T}\left({{x_{T}}} \right)}) \right] \tag{5a}\\ {\mathcal L} =& \mathcal {L}_{C}(x_{S}, y_{S}) + \mathcal {L}_{C}(x_{T}, y_{TP}) + \lambda {\mathcal L}_{\text{adv}} \tag{5b} \end{align*} View SourceRight-click on figure for MathML and additional features.where $ \mathcal {F}_{D}^{T} \text{ and } \mathcal {F}_{D}^{S}$ represent the domain discriminator and $ y_{TP}$ denotes the pseudolabel for target samples. In addition, Zhang et al. [63] utilized adversarial learning to confuse the distributions of support set and query set in metalearning task, therefore improved the network's adaptability across different SAR target recognition tasks.

Different from building a shared feature space with adversarial learning, Gao et al. [64] proposed a method based on the support tensor machine. This method mapped SAR images into a shared feature space using a shared core tensor, thus confusing the two domains and decreasing the domain gap. However, considering that the properties of the target were not taken into account when building the shared feature space, Sun et al. [61] perceived that angular rotation is a common domain difference. They proposed an attribute-guided transfer learning method, which constructed a shared feature space where the original features could be transformed into features of any other azimuth, thereby decreasing the domain gap, which can be formulated as \begin{align*} {{\mathcal L}_{r}} =& \left\Vert {{{\mathcal I}_{1}} - {\delta _{1}}\left({{x_{1}}} \right)} \right\Vert + \left\Vert {{{\mathcal I}_{2}} - {\delta _{1}}\left({{M_{g}}{x_{1}}} \right)} \right\Vert _{2}^{2} \tag{6a}\\ {M_{g}}{x_{1}} =& {x_{1}} + \gamma {\mathcal R}\left({x_{1}} \right) \tag{6b} \end{align*} View SourceRight-click on figure for MathML and additional features.where $ \mathcal {I}$ denotes the original image, $ \delta$ denotes the reconstruct block, $ M_{g}$ denotes the feature transformation, and $ \gamma$ denotes the normalized angle difference.

The shared feature space methods have achieved good performance, but there are still some directions that need to be explored: 1) How to increase or maintain discrimination between different categories is not considered in these methods. 2) How to build a shared feature space according to the properties of SAR images and targets needs to be considered further.

B. From Simulated-SAR to SAR

Despite the fact that many effective methods have been proposed for TAL from SAR to SAR, the insufficient data problem is still significant in this field. To further improve the data-driven methods for SAR target recognition, transferring knowledge from the simulated SAR data is an alternative idea. With the auto-CAD models and electromagnetic scattering mechanism, a simulated SAR target image dataset with a large volume can be obtained using software [65], [156], [157]. However, as the simulated data are essentially produced from the ideal model, there are differences in detail from electromagnetic scattering in real radar systems. How to transfer knowledge from the simulated data is also an issue worth of research [9], [12], [31], [66], [67], [68], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84].

1) Pretraining Method

The category of the target is exactly known in the simulation process, and it is easy to obtain a large amount of annotated simulated SAR target data. Pretraining methods are the most direct to utilize these simulation data [31], [66], [67], [68], [70], [71], [72].

Some methods directly pretrained the model on the simulated data to obtain the optimal initialization weights [31], [67], [68], [70]. Ma et al. [67] produced the simulated data by LSGANs instead of the simulation software, and they specialized the network for recognition tasks by fine-tuning it on the real SAR data. Inkawhich et al. [68] also discussed the influence of data augmentation, model construction, loss function choices, and ensembling techniques in pretraining model training.

Considered that there is still a domain gap between simulated data and real data due to the difference between simulation and real radar environments. Some methods processed the simulated data further [66], [71], [72]. They generalized intermediate data with more similarity to the real SAR data from simulated SAR data using CycleGAN or conditional-GAN, thus decreasing the discrepancy between simulated data and real data. They all followed the learning paradigms of pretraining on the simulated SAR data and fine-tuning on the real SAR data.

Whether one chooses to directly pretrain or to further process the simulated SAR data, there are some questions that need to be further considered: 1) The discrepancy between probability distributions needs to be taken into account. 2) The electromagnetic scattering knowledge, which is crucial for simulated SAR data, requires discussion.

2) Distribution Alignment Method

Due to the fact that simulation data are obtained in an ideal environment, there is an unavoidable distribution discrepancy between simulated and real SAR data. Some distribution alignment methods are proposed to solve this problem [9], [73], [74], [75], [76], [77], [78], [79].

In these methods, MMD is still the most popular metric to measure the discrepancy between data distributions [73], [76], [77]. Especially, in addition to decreasing the domain gap by minimizing the MMD, Sun et al. [73] ensembled a multiscale subclassifier and obtained the final output using a voting strategy. Furthermore, Sun et al. [77] performed pseudolabel generation and denoising, and fine-tuning of the classified network. Han et al. [9] presented evidential learning to estimate the confidence degree of the model, dynamic weighting to avoid the impact of inferior knowledge, and implemented knowledge distillation from simulation to real SAR target recognition by minimizing KLD.

To further align the conditional distributions for each category, Lv et al. [78] proposed to minimize the LMMD [122] combined image reconstruction task. The trained network could extract discriminative features from both simulated and real SAR images, thus improving the performance further. The LMMD and total loss can be formulated as \begin{align*} &\text{LMMD}({x_{S}},{x_{T}}) = E_{c}\left\Vert {{E_{{S^{c}}}}\left[ {\phi \left({{x_{S}}} \right)} \right] - {E_{{T^{c}}}}\left[ {\phi \left({{x_{T}}} \right)} \right]} \right\Vert \tag{7a}\\ &\mathcal {L} = \mathcal {L}_{C} \left({{x_{S}},{y_{S}}} \right) + \mathcal {L}_{C} \left({{x_{TL}},{y_{TL}}} \right) + \text{LMMD}({x_{S}},{x_{T}}). \tag{7b} \end{align*} View SourceRight-click on figure for MathML and additional features.

The intra- and interclass metric learning is a method to implicitly align the conditional probability distributions, which minimizes the distance between samples in the same class and maximizes the distance between samples in different classes regardless of the domains [74], [75], which can be seen in Fig. 7. Tai et al. utilized the d-SNE loss [158] to achieve cross-domain intra- and interclass metric learning. At the same time, Tai et al. [74], [75] considered that the samples of partial azimuths cannot comprehensively provide discriminative information of targets. They proposed to extract angle invariant properties by learning feature translation between samples with different azimuths [74] and conducting synthetic samples with different azimuths by condition GAN [75]. The d-SNE and whole loss can be formulated as \begin{align*} {\mathcal L}_{d-SNE} =& \frac{1}{n_{T}}\sum \limits _{i = 1}^{{n_{T}}} {\log \left({\frac{{\sum \limits _{y_{S}^{k} \ne y_{T}^{i}} {\exp \left({ - d\left({{x_{s}},x_{T}^{i}} \right)} \right)} }}{{\sum \limits _{y_{S}^{k} = y_{T}^{i}} {\exp \left({ - d\left({{x_{s}},x_{T}^{i}} \right)} \right)} }}} \right)} \tag{8a}\\ {\mathcal L} =& {\mathcal L}_{d-SNE} + \lambda {\mathcal L}_{C}(x_{S},y_{S}) + \beta {\mathcal L}_{C}(x_{TL},y_{TL}). \tag{8b} \end{align*} View SourceRight-click on figure for MathML and additional features.

Fig. 7. - Paradigm of metric learning across domains.
Fig. 7.

Paradigm of metric learning across domains.

Considering that the electromagnetic scattering features of SAR images are beneficial for SAR target recognition. Zhang et al. [79] proposed to obtain scattering points with the ASC model and SAR-SIFT algorithm, and extract scattering topological electromagnetic scattering features with two graph neural networks. More important, CORAL [159] was utilized to measure the distribution discrepancy with the second-order statistics, which is formulated as \begin{align*} & {\mathcal L}_{\text{CORAL}}({x_{S}},{x_{T}}) = \frac{1}{{4{d_{S}^{2}}}}\left\Vert {{C_{S}} - {C_{T}}} \right\Vert _{F}^{2}\tag{9a}\\ & {C_{S}} = \frac{1}{{{n_{S}} - 1}}\left({x_{S}^{T}{x_{S}}} \right), {C_{T}} = \frac{1}{{{n_{T}} - 1}}\left({x_{T}^{T}{x_{T}}} \right). \tag{9b} \end{align*} View SourceRight-click on figure for MathML and additional features.

The distribution alignment methods minimize the distribution discrepancy metrics to decrease the domain gap, with metrics playing a crucial role. 1) The predefined metrics may not be suitable for the scene of TAL from simulated SAR data to real SAR data. 2) The special properties of SAR images or radar targets, such as scattering features, are not explored enough.

3) Shared Feature Space Method

The core of shared feature space methods for TAL from simulated SAR data to real SAR data lies in creating a feature space, where simulation and real SAR images become indistinguishable based on their features. In this space, targets with similar attributes are represented by similar feature vectors [12], [80], [81], [82], [83], [84].

The idea of adversarial learning remains a classic one to build shared feature space for this scene [12], [80], [81], [82], [83]. Zhang et al. [80] proposed to decrease the gap between simulation images of different bands using adversarial learning to achieve SAR target recognition across frequency bands. Wang et al. [81] utilized the simulated SAR data to make up for the insufficient data problem in metalearning for SAR target recognition, and they confused simulated data and real data utilizing adversarial learning. Du et al. [83] successfully transferred knowledge from simulated to real SAR data using adversarial learning, while also incorporating image reconstruction to preserve crucial information and enhance performance.

To further decrease the domain gap between simulated SAR data and real SAR data, some methods take a step by distribution alignment or GAN [12], [82], [84]. Lv et al. [82] combined minimizing the MK-MMD and adversarial learning losses to transfer knowledge from simulated SAR vehicle image to real SAR vehicle image. Shi et al. [84] proposed a pseudolabel refinement strategy based on the ASC model. They also employed contrastive learning (learning the representation by maximizing the similarity between similar samples and minimizing the similarity between dissimilar samples) to compact intraclass prototypes and separate interclass prototypes, further narrowing the domain gap. Chen et al. [12] used CycleGAN to generate intermediate domain data bridging simulated and real data. Subsequently, they performed adversarial learning between the intermediate and real SAR domains, significantly reducing the complexity of the learning process and greatly enhancing performance.

The shared feature space methods aim to confuse the domains and thereby reduce the domain discrepancy. However, there are some questions needed to consider further: 1) How can we effectively perform adversarial learning while simultaneously maintaining discriminative information. 2) The common electromagnetic scattering properties or target characteristics shared between simulation and real data have not received sufficient attention.

SECTION V.

Heterogeneous Transfer Adaptation Learning

Currently, there are two kinds of heterogeneous data that can provide transferable knowledge for SAR target recognition: optical images and AIS data. Optical imaging relies on receiving reflected light by optical sensors, which is passive imaging compared to SAR. Obviously, the optical images typically contain clearer textures and less noise, but are susceptible to lighting conditions [32]. AIS, designed specifically for vessels, is a real-time network that combines transmitters and receivers to transport dynamic, static, and voyage-related text information, such as speed, GPS location, widths, heights, and categories of ships [160], [161], [162], [163]. Substantially, these data are highly heterogeneous compared to SAR data, due to the significantly different mechanisms for obtaining data. Optical images and AIS can provide abundant target information, and how to transfer the knowledge from them to SAR target recognition tasks is becoming a key research question.

A. From Optical to SAR

Optical images can be divided into two categories: natural optical images and remote sensing optical images. Typically, natural optical images possess a considerable data volume, thereby offering ample supervised information. The remote sensing optical images always contain the targets of the same category in SAR images, which has a stronger correlation with target recognition. Due to the fact that optical images are more in line with the human visual system, it is easier to obtain annotations [32] compared with SAR images, which contain relatively blurry texture and speckle noise. Therefore, how to transfer the rich knowledge provided by optical images to SAR target recognition task to avoid annotating the SAR images, has attracted the interest of many researchers [32], [34], [86], [87], [88], [89], [90], [93], [94], [95], [97], [100], [101], [104], [106], [108], [110], [112], [113].

1) Pretraining Method

Compared to SAR images, optical images have a larger data volume and more diversity [37]. Pretraining on these data and then specializing the model for other downstream tasks in different domains, leveraging its ability to extract generalized features, is an efficient and effective approach [34], [86], [87], [88], [89], [90], [93], [94], [95], [97], [100], [101], [104], [106], [108].

Among these methods, pretrained on the ImageNet dataset or other natural optical image dataset, is the most common [34], [86], [93], [97]. However, Ying et al. [89] conducted pretraining experiments on both the MSTAR and ImageNet datasets. Their results demonstrated that pretraining the model on SAR images containing the same target led to better performance. Nevertheless, the performance obtained through pretraining on optical images was also noteworthy. To further improve the recognition performance, many pretraining methods combined with other technology are proposed [87], [88], [94], [100]. Lu et al. [87] utilized several data augmentation methods in the fine-tuning stage to improve the volume of the SAR images. Considering the pretrained model is easily over-fitting due to the scarce SAR image data in fine-tuning, Zhong et al. [88] proposed the method combined with the model compression technology to conduct better fine-tuning and accelerate the model's inference speed. The different resolutions of images bring difficulties to the ship recognition in SAR images, Relekar et al. [94] proposed a module integrated into the pretrained model to extract scale-variant features to improve the robustness of features. Wang et al. [100] believed that merely duplicating the single channel of SAR images three times to utilize a pretrained model that had been trained on optical images with three channels was obviously not reasonable. Instead, the subaperture decomposition algorithm [164] was employed to generate pseudocolor SAR images, allowing for better utilization of the general knowledge from optical images encoded in the pretrained model.

Although these methods have gained good performance on SAR target recognition, the targets in natural optical and SAR images exhibit a significant difference, which may harm the performance of the pretrained model. Several methods were proposed to perform pretraining on the optical land cover images [95] and the remote sensing optical ship images [90], [101]. These methods demonstrated that pretraining the model on remote sensing optical images leads to better performance. Moreover, the pretrained model has learned to effectively extract target features from remote sensing optical images, thereby enhancing SAR target recognition capabilities.

Different from the general pretrained method, Tai et al. [104] calculated network-layer-level and feature-channel-level transfer weights by attention mechanism to utilize the pretrained model more reasonably, thus achieving better performance on few-shot SAR target recognition. After that, Tai et al. [106] improved this work [104] by adding a push-attention mechanism and adjusting the learning rate according to the quality of SAR samples to conduct a better optimization strategy in the fine-tuning stage.

The difference between the imaging mechanisms of optical and SAR sensors indeed posed a significant challenge that constrains the improvement of models. Gao et al. [108] addressed this issue by introducing the SAR ship recognition method, leveraging a cross-modality GAN framework. They integrated the GAN with the attention mechanism CBAM [165] to generate robust pseudo-SAR images from optical remote sensing ship images. This strategy expanded the SAR image dataset, thereby enhancing the training process of the model.

Although pretraining is an effective technology used for TAL in SAR target recognition, there are still some directions worth studying. 1) The discrepancy between pretraining data and target SAR data, including data probability and modal discrepancy, is not fully considered. 2) The electromagnetic scattering features of SAR images are beneficial for target recognition, but they are often ignored in the pretraining methods.

2) Distribution Alignment Method

Due to the difference between the optical images and SAR images, there is a significant probability distribution discrepancy between them. The distribution alignment is a kind of effective method to solve this, and it is also commonly used in the field of TAL from optical to SAR [32], [110], [112].

Considering the long-tailed distributions of SAR image data, Chowdhury et al. [110] proposed to improve the performance of the method combined with knowledge distillation and class balancing. The target recognition knowledge learned from the electronic optical images was transferred to SAR target recognition by minimizing the KLD. However, the commonly used discrepancy measures, KLD and JSD, suffered the gradients vanish problem. They are not suitable for deep learning methods, which are based on first-order gradient optimization [32]. Mohammad et al. [32] used the Wasserstein distance (WD) to measure the distribution discrepancy between optical remote sensing images and SAR images. To reduce the computation burden, they approximated the WD with sliced WD (SWD) [166], which is the sum of multiple WD of 1-D distributions. It can be formulated by \begin{equation*} SWD\left({{P_{S}},{P_{T}}} \right) = \int _{{\mathbb {S}^{d - 1}}} {{\mathcal W}\left({{{\mathcal R}_{{P_{S}}}}\left({ \cdot; \gamma } \right),{{\mathcal R}_{{P_{T}}}}\left({ \cdot; \gamma } \right)} \right)} d\gamma \tag{10} \end{equation*} View SourceRight-click on figure for MathML and additional features.where $ \mathcal {W}$ denotes the WD, $ \mathbb {S}^{d-1}$ denotes the $ d$-dimensional sphere, $ {\mathcal R}_{P_{S}}({ \cdot; \gamma })$ denotes the 1-D slice of the distribution along the direction $ \gamma$. Therefore, as the feature distributions of remote sensing optical images and SAR images get closer, the classifier trained on the label remote sensing optical images and partially labeled SAR images can get better generalization.

The subdomain (subclass) distribution alignment is the same important, and the irrelevant information between the two domains will cause negative transfer learning. Zhao et al. [112] proposed a deep subdomain adaptation SAR ship recognition method. It measured the distribution discrepancy between optical remote sensing ship images and SAR ship images using LMMD, and integrated the CBAM to heuristically focus the network on the “what” and “where” for knowledge transfer learning, which is formulated as \begin{equation*} \mathcal {L} = \mathcal {L}_{C} \left({{x_{S}},{y_{S}}} \right) + \text{LMMD}({x_{S}},{x_{T}}). \tag{11} \end{equation*} View SourceRight-click on figure for MathML and additional features.Distribution-alignment methods can obtain high performance, but there are still some questions worth studying:

  1. The predefined distribution metrics may not be suitable for TAL from optical image to SAR image.

  2. Not all the features can be used to transfer, and how to decouple transferable features is a question worth considering.

  3. The discrepancy between the two modalities, optical image and SAR image, is not considered in these methods.

3) Shared Feature Space Method

The difference between shared-feature-space methods for homogeneous data and heterogeneous data lies in the fact that the latter aims to create a feature space, where features from different modalities cannot be distinguished.

Optical and SAR are two different modalities. If we directly transfer knowledge from optical images to SAR images cannot achieve satisfactory target recognition performance. Song et al. [113] proposed a two-stage cross-modality transfer learning ship recognition method. It first generated intermediate modality data from optical ship images using the GAN, then built a shared feature space to blend the intermediate domain and target domain through the adversarial learning method and aligned the distribution with LMMD, which can be seen in Fig. 8.

Fig. 8. - Framework of cross-modality transfer learning.
Fig. 8.

Framework of cross-modality transfer learning.

This method first combined shared-feature-space learning with GAN-based generation and distribution alignment. It is a novel integrated framework, which holds enlightening significance for subsequent SAR target recognition work.

B. From AIS to SAR

AIS contains abundant information specific to ships. Compared to optical images, AIS data are easier to obtain and have a larger volume. The length and width of the ship are common and transferable information in the AIS data [33], [116], [117], [118], [119]. How to design methods to utilize the naive geometric information from AIS to the SAR target recognition is a new topic recently.

1) Pretraining Method

Due to the real-time of AIS and its mandatory installation on ships of a certain tonnage, there is a large amount of available AIS data. Therefore, the pretraining technology is also used in TAL from AIS to SAR.

Yan et al. [116] pretrained the multiple ensemble classifier on the NGFs, calculated by the length and width in AIS data [33]. They extracted the lengths and widths of ships in SAR images with the minimum bounding box algorithm to construct SAR NGFs. Then, they utilized the pretrained classifier directly on SAR NGFs to obtain the categories of ships in SAR images.

There are still discrepancies between the NGFs from AIS data and SAR images, and directly using the pretrained model for SAR ship recognition is not reasonable. Lang et al. [33] proposed to perform model parameter regularization to further optimize the model, which is formulated as \begin{equation*} {\mathcal L} = \frac{1}{2}\left\Vert {{w_{T}} - \Gamma {w_{S}}} \right\Vert + \lambda \sum \limits _{i = 1}^{{n_{T}}} {\left({y_{S}^{i}\left({{w_{T}}{x_{T}^{i}} + {b_{T}}} \right) - 1} \right)} \tag{12} \end{equation*} View SourceRight-click on figure for MathML and additional features.where $ w_{T}, b_{T}$ denote the weights and bias of the target SVM, $ w_{S}$ denote the weights and bias of the source SVM. They first trained a source SVM on the AIS NGFs. Then they froze it and trained the target SVM on SAR NGFs with L2-Regularization between the parameters of target SVM and source SVM. The target SVM can perform better on SAR ship recognition than directly using the pretrained model.

Although pretraining methods can achieve good performance, there are still some directions to be considered, such as the accuracy of NGFs, which is crucial but challenging to achieve in SAR images due to their blurry texture.

2) Distribution Alignment Method

Since NGFs can be directly obtained from AIS and through image processing algorithms from SAR images, there also exists a distribution discrepancy between the two types of NGFs. Some researchers have conducted early attempts to apply distribution alignment technology to achieve TAL from AIS data to SAR images.

Xu et al. [117] first proposed a framework, ARTL, which can be formulated as \begin{align*} {\mathcal L}=&{{\mathcal L}_{C}}\left({{x_{S}},{y_{S}}} \right) + \sigma {\mathcal R}\left({\mathcal F} \right) + \lambda {{\mathcal R}_{d}}\left({{x_{S}},{x_{T}}} \right) \\ &+ \gamma {{\mathcal R}_{m}}\left({{x_{S}},{x_{T}}} \right) + \mu {{\mathcal R}_{S}}\left({{x_{S}}} \right) \tag{13} \end{align*} View SourceRight-click on figure for MathML and additional features.where $ \sigma, \lambda, \gamma, \mu$ denote the weights of regularization terms. By minimizing $ {\mathcal R}_{d}({{x_{S}},{x_{T}}})$ between marginal distributions and conditional distributions, manifold consistency alignment $ {\mathcal R}_{m}({{x_{S}},{x_{T}}})$ and source discriminative information preservation $ {\mathcal R}_{S}({x_{S}})$, ARTL performed well in the semisupervised learning tasks for SAR ship recognition. After that, Xu et al. [118] proposed a method that combined metric learning, distribution alignment, and manifold regularization. It extended the method into semisupervised learning and unsupervised learning scenes for TAL from AIS to SAR images. The former methods achieved TAL with certain restrictions on using homogeneous features, which may hinder further improvement [119]. Yang et al. [119] first proposed semisupervised heterogeneous domain adaptation using a dynamic joint correlation alignment network. The whole loss function of it can be formulated as \begin{align*} {\mathcal L}=&{{\mathcal L}_{C}}\left({{x_{S}},{y_{S}}} \right) + {{\mathcal L}_{C}}\left({{x_{T}},{y_{T}}} \right) \\ &+ \alpha \left(\left({1 - \mu } \right){{\mathcal L}_{\text{CORAL}}}\left({{x_{S}},{x_{T}}} \right) \right.\\ & \left. + \; \mu {{\mathcal L}_{\text{CORAL}}}\left({\left({{x_{S}},{y_{S}}} \right),\left({{x_{T}},{y_{T}}} \right)} \right) \right). \tag{14} \end{align*} View SourceRight-click on figure for MathML and additional features.

It transformed the AIS NGFs and SAR image into the same feature space to eliminate heterogeneity. In addition, it not only performed marginal and conditional distribution alignment, but also conducted pseudolabel refinement.

Distribution-alignment methods have also obtained satisfactory performance for TAL from AIS data to SAR images, but some directions still need to be explored: 1) The other features of SAR images besides NGFs were not utilized. 2) These methods did not consider the modality discrepancy between AIS and images.

C. From AIS and Optical to SAR

All previous references addressed the TAL from a single heterogeneous source domain, such as optical images or AIS data. Optical images and AIS data are totally different, providing texture visual features and NGFs, respectively. They are complementary to some extent. Therefore, developing a multisource heterogeneous transfer learning framework is a very valuable issue to be solved.

Lang et al. [102] proposed a multisource heterogeneous transfer learning method for SAR ship recognition. It first supplemented the feature vector with the same dimension zero vector of other domains. Then the multisource domains and target domain feature vectors were transformed into a common feature space to eliminate heterogeneity and excavate complementary across multisource domains, thus achieving knowledge transferring from multisource to SAR target recognition.

TAL from heterogeneous multisource domains to SAR image domain for target recognition is a completely new research field. Many directions need to be further considered:

  1. How to excavate the complementary and eliminate redundancy of multisource data.

  2. How to overcome unbalanced amounts of multisource data.

  3. How to eliminate the heterogeneity between multisource domains and between source domains and target domain.

SECTION VI.

Available Data

As analyzed in the previous sections, homogeneous and heterogeneous TALs aim to address distinct issues in specialized scenarios. For homogeneous TAL, the SAR datasets or the simulated SAR datasets are taken as the source domain. For heterogeneous transfer learning, the source domain may encompass the AIS dataset, optical image dataset, or multisource of AIS and optical image datasets. With the development of artificial intelligence in the remote sensing field, there are already many high-quality SAR image and optical image datasets available for target recognition. It is worth mentioning that there are relatively few high-quality available datasets for simulated SAR data and AIS. Many of the datasets used in previously reviewed literature are self-built datasets. In this section, we focus on the publicly available datasets commonly used in the reviewed literature, and introduce their configurations and usage scenarios. The detailed information on these common datasets can be seen in Table V. The unfilled cells in the second column (Resolution) of Table V indicate that the image resolution in this dataset is within a large range.

TABLE V Detailed Information of the Common Datasets
Table V- Detailed Information of the Common Datasets

A. SAR Image Datasets

In this section, we focus on introducing the commonly used dataset, MSTAR [42], OpenSARShip [44], FUSAR [62], HR-SAR [103], MR-SAR [128], which can be used as source domain or target domain.

1) Vehicle Datasets

MSTAR [42] is the most commonly used vehicle target recognition dataset, which is collected by the Sandia National Laboratory and the U.S. Defense Advanced Research Projects Agency and the U.S. Air Force Research Laboratory. The images in MSTAR are obtained by a 10-GHz X-Band SAR satellite under different depression angles, with aspect views covering $ 0\text{--}360$. The images have a resolution of $ \text{0.3}\;\text{m} \times \text{0.3}\;\text{m}$ and a size of nearly $ 128\text{ pixels} \times 128\text{ pixels}$. It provides $ 190\text{--}300$ images for the target in each depression angle. MSTAR can be extended into subdatasets under the standard operating condition (SOC) and various extent operating conditions (EOCs). SOC denotes the training images and test images are obtained at different depression angles with slight differences, such as 17$ ^{\circ }$ for training images and 15$ ^{\circ }$ for test images. EOCs denote that the training and test images are obtained at different depression angles with large differences, such as 17$ ^{\circ }$ for training images and $ \text{30}^{\circ } \text{ and } \text{45}^{\circ }$ for test images, or at different noise levels, 1%, 5%, 10%, and 15% for test images.

2) Ship Datasets

The number of ship datasets is relatively larger than that of the vehicle dataset. The reviewed literature used many publicly available ship datasets [44], [62], [103], [128] and self-built ship datasets. OpenSARShip [44] is collected by Shanghai Jiao Tong University. The images in it are all obtained from a C-band Sentinel-1 satellite with polarization of VH and VV. It provides 11 346 ship chips from 41 SAR sea scene images, which contain 17 types of ships. FUSAR [62] is collected by Fudan University. The images in it are all obtained from China's first civil C-band Gaofen-3 satellite with polarizations of HH, HV, VH, and VV. It provides 5243 ship chips from 126 SAR images across a variety of scenarios, which include 15 primary ship categories and 98 subcategories. HR-SAR [103] is collected by the National University of Defense Technology, and the images in it are all obtained from TerraSAR-X with polarizations of HH, VH, and VV. It provides 450 ship chips from six scenes, encompassing three types of ships. MR-SAR [128] is collected by the Beijing University of Chemical Technology. The images in it are all obtained from eight scenes captured by Radarsat-2 with VV polarization, containing four types of ships and providing 712 chips for all types. In addition, some researchers collected the data from the private SAR images or extracted data from the ship detection datasets, HRSID [115], SSDD [167], SAR Ship dataset [99], LS-SSDD [168].

3) Other SAR Datasets

In addition to the target image described earlier, some public and self-built SAR scene classification datasets [169] are also used as the source domain for SAR characteristic knowledge transferring.

B. Simulated SAR Image Datasets

As the development of the computer simulated technology, some simulation tools are opened source, CASPatch [156], RaySAR [157], SARSIM [65]. Utilizing the electromagnetic imaging mechanism of the radar system along with the models of the target and its surrounding terrain, simulated SAR images could be generated through these tools. Therefore, the simulated SAR dataset is easier to obtain compared to the real SAR dataset. Nevertheless, there are relatively few publicly available datasets [69]. Many of the datasets used in these reviewed papers are self-built [170]. SAMPLE [69] is a commonly used simulated SAR vehicle dataset, which contains 1345 pairs of simulated and real SAR images. The real SAR vehicle images within it originate from the MSTAR. The simulation SAR vehicle images are derived by carefully crafting CAD models under configurations and sensor setting parameters aligned with the MSTAR. Notably, the vehicle categories represented in these simulation SAR images are identical to those in the MSTAR.

C. Optical Image Datasets

Due to the different imaging modes, optical images and SAR images are two completely heterogeneous data. Heterogeneous TAL methods leverage natural optical image datasets [171] to extract general discriminative features and employ remote sensing optical ship datasets [109], [172] to learn specific ship characteristics. The remote sensing optical ship images in these datasets are primarily sourced from Google Earth or various optical object detection datasets, including DOTA [173], HRSC2016 [114], and NWPUVHR-10 [174]. In addition, some datasets come from the competition data provided by IEEE [105] and Kaggle [91], [98].

D. AIS Datasets

AIS is first designed for collision prevention, which exchanges static, dynamic, and voyage information between ships or between ship and the station. Given that AIS inherently includes information on the ship's category, downloading AIS data from public websites1 is generally more convenient than manually annotating SAR images. Consequently, the majority of AIS data utilized in the reviewed literature originates from this.

Based on the available datasets mentioned, it is evident that there is a vast array of data that can be harnessed to verify the effectiveness of TAL methods for SAR target recognition. While some methods rely on self-built datasets, there are also standardized datasets that serve as valuable validation tools. For tank vehicle targets, the MSTAR and SAMPLE datasets are commonly utilized, while OpenSARShip, FUSAR, SAR Ship [103], and AIS-3 datasets focus on ship targets. Commonly employed configurations include: Validating the Homogeneous TAL method, such as MSTAR$ (17^\circ) \rightarrow$ MSTAR$ (15^\circ)$, OpenSARShip $ \Leftrightarrow$ FUSAR, or SAMPLE $ \rightarrow$ MSTAR. Testing the Heterogeneous TAL method, for instance, FGSCR $ \rightarrow$ OpenSARShip or FUSAR, AIS-3$ \rightarrow$ OpenSARShip, FUSAR, or SAR Ship [103].

SECTION VII.

Experiments

In previous sections, we conducted a thorough analysis of diverse TAL algorithms tailored specifically for SAR target recognition tasks, along with the datasets utilized to validate the efficacy of these algorithms. In this section, we delve into the performance analysis of these TAL algorithms, and offer an intuitive evaluation and comparison of them with experiments. Specifically, we benchmarked them against general visual TAL methods designed for general target recognition, including DAN [175], DANN [176], DeepCoral [159], and DSAN [122], which are commonly employed in comparative experiments in the reviewed papers. The experiment was structured into two distinct groups to verify the algorithms for SAR land vehicle targets and SAR ocean ship targets, as these are currently the most common targets in SAR target recognition research.

A. Performance Evaluation Indexes

We evaluate the performance of SAR target recognition algorithms through average accuracy. Accuracy represents the percentage of correctly predicted samples among all samples, while average accuracy represents the average accuracy of all categories. Average accuracy can be formulated as \begin{align*} \text{Precision}_{i} =& \frac{{TP_{i}}}{{TP_{i} + FP_{i}}} \tag{15a}\\ { {Average \;Accuracy}} =& \frac{{1}}{c}\sum \limits _{i = 1}^{c} {\text{Precision}{_{i}}} \tag{15b} \end{align*} View SourceRight-click on figure for MathML and additional features.where the $ \text{Precision}_{i}$ represents the precision of class $ i$, $ c$ represents the number of classes, $ TP_{i}, FP_{i}$ represent the number of the samples correctly predicted as class $ i$ and the number of the samples incorrectly predicted as class $ i$, and $ TP_{i} + FP_{i}$ represents the number of all samples as class $ i$.

B. Comparison of TAL Methods for SAR Vehicle Targets Recognition

The experimental results of TAL method for SAR land vehicle targets are presented in Table VI. In this table, MSTAR SOC denotes the data under SOCs of the MSTAR dataset, while MSTAR represents the MSTAR data corresponding to the SAMPLE simulation dataset. Column 3 indicates backbone of the models used in these algorithms, with “self-built” indicating the backbone is specially designed. Column 5 displays the average accuracy of the algorithms.

TABLE VI Comparison of TAL Methods for SAR Vehicle Targets Recognition
Table VI- Comparison of TAL Methods for SAR Vehicle Targets Recognition

From the upper part of Table VI, a comparative analysis of the experimental results under SOCs using MSTAR SOC data reveals that numerous pretraining techniques enhance model performance. This improvement is achieved either by acquiring pertinent knowledge from related domains or from augmented existing data. Notably, Huang et al. [30] enhanced the SAR target recognition models' performance by 0.75% compared with no pretraining (98.30%) by leveraging relevant knowledge derived from scene SAR images. Shang et al. [45] and Pei et al. [55] further optimized the model's performance through the utilization of augmented MSTAR data. Moreover, researchers have explored extracting relevant knowledge from general optical image datasets, which has significantly improved the model's recognition capabilities [95], [100]. It is worth noting that the transitive TAL of knowledge from general optical natural images to SAR scene images and then to SAR target images improved performance by 1.44% compared to directly transferring knowledge from general optical natural images to SAR target images (98.02%) [95]. Remarkably, Chen et al. [59] achieved near-perfect performance of 100% using methods tailored specifically for SAR vehicle targets, even under limited annotated sample conditions.

From the remaining part of Table VI, it becomes evident that numerous researchers aim to harness knowledge from simulated data to enhance the ability of their models in distinguishing real targets. Kim et al. [72] employed GANs to generate intermediate domain data from SAMPLE simulation data for pretraining, resulting in a remarkable 10.49% improvement over directly utilizing augmented SAMPLE simulation data [68]. Furthermore, the tailored TAL algorithms [77], [79] and [9] developed for SAR vehicle targets achieved superior performance, outperforming the general DSAN [122] by 6.99%, 10.23%, and 8.13%, respectively. Similarly, the shared feature space TAL algorithms [12] and [84] designed specifically for SAR targets demonstrated a significant boost in performance, surpassing the general adversarial TAL algorithm (DAN) by 14.06% and 14.71%, respectively. This underscores the effectiveness of these tailored algorithms in SAR vehicle target recognition tasks. In addition, algorithms [79] and [84], which utilized the electromagnetic scattering knowledge of SAR vehicles in their algorithm, achieved best performance in distribution alignment algorithms and shared feature space algorithms. It indicates that the models achieve better generalization by introducing electromagnetic scattering and traditional visual features.

After conducting numerous comparative experiments on SAR vehicle target recognition tasks, it has become evident that TAL algorithms specifically designed for SAR vehicle targets outperform general TAL methods. Furthermore, the generalization capability of the model can be significantly enhanced by incorporating electromagnetic scattering features or traditional visual features. However, it is worth noting that models trained using simulation data, like SAMPLE, exhibit lower generalization performance compared to those trained with real SAR datasets, such as MSTAR and SAR Scene datasets. This finding underscores the fact that the domain disparity between real SAR images is smaller than the disparity between simulated and real SAR images. Nevertheless, the accessibility of real data remains unparalleled compared to simulation data, making both scenarios worth of investigation.

C. Comparison of TAL Methods for SAR Ship Targets Recognition

We validated the TAL algorithm for SAR ship recognition from optical images to SAR images and from AIS to SAR images using optical ship data, AIS data, and SAR ship data. The validation results of all algorithms can be seen from Table VII.

TABLE VII Comparison of TAL Methods for SAR Ship Targets Recognition
Table VII- Comparison of TAL Methods for SAR Ship Targets Recognition

From the upper part of Table VII, we can observe that the algorithm [112], specifically designed for SAR ship targets, outperforms the pretraining-based algorithm [101] by 1.22%. Furthermore, the algorithm [112] surpasses the recognition capabilities of the general TAL algorithm DSAN [122], which relies on distribution alignment, by a margin of 4.89%. In addition, it significantly outperforms the recognition performance of another general TAL algorithm, DAN [176], which is based on a shared feature space, by a significant 12.22%.

The lower part of the table reveals that certain methods designed specifically for transferring pertinent knowledge from AIS data to SAR images are highly effective for SAR ship target recognition. Notably, algorithm [119] achieved a performance of 83.18% solely by relying on the NGFs of ships provided by AIS. Remarkably, algorithm [102] enhances recognition performance by 2.42% by combining these NGFs from AIS with visual features derived from optical ship images.

After conducting comprehensive comparative experiments on SAR ship target recognition tasks, it becomes evident that the TAL algorithm tailored specifically for SAR ship targets outperforms general TAL methods. It is crucial to emphasize that the knowledge pertaining to ship targets, derived from both optical images and AIS data, can significantly boost the model's generalization capabilities. Furthermore, when these two data types are combined, they jointly provide pertinent knowledge, thereby further enhancing the model's generalization. This underscores the potential of multisource data to offer complementary information, ultimately elevating the performance of the model.

SECTION VIII.

Discussion and Recommendations

Existing TAL methods have succeeded in promising performance in SAR target recognition. There is still space for performance improvement, due to the gap between the performance of these methods and the performance upper bound. In addition, in most reviewed literature, it is supposed that there is unlabeled data (unsupervised) or partial data with annotations (semisupervised) in the target domain. However, some real scenarios in SAR target recognition do not match these settings. In this section, we provide some possible future directions for TAL in SAR target recognition.

A. Challenge for Current Methods

TAL methods, whether homogeneous or heterogeneous, can be categorized into three main methods: pretraining, distribution alignment, and shared-feature-space approaches. These methods have different applicable scenarios and characteristics. However, they remain confronted with several challenges, such as insufficient data, difficulty in distribution discrepancy metric selection, feature discrimination maintenance, radar target characteristics utilization, and modality gap.

1) How to Overcome Insufficient Data

Pretraining methods necessitate an ample supply of relevant data to enable the model to acquire the ability to extract generalized features. The volumes of optical image data and AIS are relatively adequate due to the ease of data acquisition. However, there are almost no SAR images sufficient for pretraining. Generally speaking, the size and diversity of most datasets suitable for TAL in SAR target recognition are relatively limited, making it challenging to train a pretraining model with good generalization. As a result, there is an urgent need for well-organized, large-scale, and diverse benchmark datasets in the context of pretraining methods for SAR target recognition. In addition, generating more data utilizing generative artificial intelligence such as GAN [177] and diffusion models [178], or computer simulation techniques [65], [156], [157] are also possible to overcome the impact of insufficient data.

2) How to Select Distribution Discrepancy Metric

For distribution alignment methods, the most important issue is how to construct an appropriate metric of distribution discrepancy. First, the distribution alignment method excels in scenarios where the distribution differences are relatively small, such as TAL from simulated SAR to real SAR. However, in cross-modal TAL tasks, like from optical to SAR images, the predefined distribution alignment methods fall short in terms of performance. Furthermore, classification problems often encounter issues with unbalanced categories, whereas the distribution alignment method tends to overlook this aspect, disproportionately focusing on the transfer of categories with a larger sample size. Predefined metrics such as MMD, LMMD, KLD, SWD, and CORAL are commonly used in these methods. These metrics measure the distribution differences between the source and target domains from different perspectives. However, considering only marginal distribution, conditional distribution, or covariance alone is not comprehensive. Therefore, the challenge lies in selecting or designing comprehensive and suitable metrics for diverse scenarios within TAL for SAR target recognition, which remains a difficult and unexplored problem. Therefore, customized distribution discrepancy metrics tailored to specific scenarios, while taking into account imbalances in categories and long-tailed distributions, have the potential to overcome this challenge in the future.

3) How to Maintain Feature Discrimination

For shared-feature-space methods, most of them mapped the samples of source and target domains into domain-indivisible shared feature space with adversarial learning. However, it cannot guarantee the samples are mapped into property position in the shared feature space [179]. That is to say, it may reduce or damage the classification ability of the model, causing negative transfer. Therefore, it is worth studying how to map samples to the domain-indivisible shared feature spaces while maintaining or enhancing the discrimination of features. Integrating metric learning [180] and contrastive learning [181] to construct a shared feature space while preserving distinguishing features between classes can be a solution to address the scarcity of discrimination in the future.

4) How to Utilize Radar Target Characteristics and Traditional Features

Compared to other types of data, SAR images possess rich electromagnetic scatter characteristics that encompass the inherent properties of the radar targets themselves. However, there is insufficient utilization of SAR radar target electromagnetic characteristic information in knowledge transfer. In addition, traditional features exhibit strong stability and interpretability, making them nonnegligible. Therefore, considering the knowledge transfer of physical electromagnetic features and traditional features alongside visual features may further enhance the performance of recognition models. Alternatively, integrating these rich knowledge as prior information into TAL models may also be a good way to improve model generalization in the future.

5) How to Narrow the Modal Gap

In particular, the heterogeneous TAL is a problem involving cross-modality target recognition [37]. There is significant heterogeneity between SAR images and optical images or AIS, which was not considered in the method design. Overcoming these modal differences is a crucial step toward achieving effective heterogeneous TAL. How to separate modal-specific features and modal-invariant features through feature decoupling is worth considering, aiming to solve the heterogeneity in the process of knowledge transfer. Some advanced learning paradigms, methods, or models, such as contrastive learning [182], transformer [183], cross-modality translation [184], can be applied to overcome the heterogeneity in the future for TAL in SAR target recognition.

B. Future Directions

TAL is a hot topic with new technology coming out in an unending flow. Because of their relevance to some realistic scenarios of SAR target recognition, some meaningful points of TAL within the community deserve attention. Furthermore, the development of TAL in SAR target recognition could be accelerated by the application of advanced artificial intelligence technologies, particularly the foundation model. Therefore, five points are summarized in this section to quickly understand the possible future directions of TAL in SAR target recognition.

1) Partial, Open-set, and Universal Domain Adaptation

The methods reviewed in this article primarily assume that the feature space of the source domain differs from that of the target domain, while the label space remains the same. This setup is usually referred to as the closed domain adaptation task ($ \mathcal {X}_{S}\ne \mathcal {X}_{T}, \mathcal {Y}_{S}=\mathcal {Y}_{T}$). However, this assumption poses significant limitations for realistic applications [185]. The methods have indeed addressed the issue of data shift between different domains, but they overlook the label shift (label spaces of the two domains not completely overlap) that also exists between these domains. This oversight is particularly problematic as the label shift is more in line with the realistic scenarios encountered in SAR target recognition. In real-world scenarios of TAL in SAR target recognition, various label space relationships can occur. These include cases where the target domain label space is a subset of the source domain label space ($ \mathcal {Y}_{T}\in \mathcal {Y}_{S}$), where the source domain label space is a subset of the target domain label space ($ \mathcal {Y}_{S}\in \mathcal {Y}_{T}$), where there is an intersection between the two label spaces ($ \mathcal {Y}_{S}\cap \mathcal {Y}_{T}\ne \emptyset$), or where the target domain label space is completely unknown ($ \mathcal {Y}_{T}=?$). All these scenarios need to be taken into account when developing effective TAL methods for SAR target recognition, and the difference between these scenarios can be seen in Fig. 9.

Fig. 9. - Difference between closed set, partial, open set, and universal domain adaptation.
Fig. 9.

Difference between closed set, partial, open set, and universal domain adaptation.

Partial domain adaptation learning is designed to tackle the issue of transferring overlapping categories from the source domain to the target domain while avoiding the influence of redundant categories during knowledge transfer [186]. This approach ensures that only relevant information is transferred, thus enhancing the performance of target domain tasks. By focusing on the scenario of $ \mathcal {Y}_{T}\in \mathcal {Y}_{S}$, partial domain adaptation learning aims to enhance the performance of target domain tasks by effectively filtering out irrelevant categories from the source domain.

Open-set domain adaptation learning is designed to address the issue of knowledge transfer from the source domain while distinguishing unknown categories in the target domain [187]. The key advantage of it is the ability to handle scenarios where the label spaces of the source and target domains are not fully aligned, $ \mathcal {Y}_{S}\in \mathcal {Y}_{T}$ or $ \mathcal {Y}_{S}\cap \mathcal {Y}_{T}\ne \emptyset$, allowing for more generalizable models.

Universal domain adaptation learning is designed to solve the problem of correctly classifying known categories and distinguishing the unknown categories with the knowledge from the source domain [188]. Unlike open-set domain adaptation, which requires an intersection in the label space of the two domains, universal domain adaptation does not have this constraint. Consequently, the label space of the target domain remains unknown, $ \mathcal {Y}_{T}=?$.

The difficulty level of these four questions gradually rises, reflecting different scenarios encountered in SAR target recognition tasks. Here is a breakdown of each scenario:

  1. Closed domain adaptation: The target categories (target domain) to be identified are the same as the known categories (source domain).

  2. Partial domain adaptation: The known categories contain and exceed the categories of the targets to be identified.

  3. Open-set domain adaptation: The categories of the targets to be identified contain and exceed the known categories.

  4. Universal domain adaptation: The relationship between the known categories and target categories is unknown.

These scenarios represent realistic challenges encountered in SAR target recognition tasks and are therefore worth of in-depth research to develop generalizable recognition systems.

2) Domain Generalization

When the operation conditions of radar, such as depression angle, polarization mode, wave band, etc., or the azimuth of the target suddenly change, or in some other situations, the data of the target domain are unavailable. That is to say, the distribution of data in the target domain is unseen. How to generalize the model to an invisible target domain under the condition of only source domain data (or multiple source domain data) available is a worthwhile research question, which may occur in real scenarios in the field of SAR target recognition. With the advancement of TAL technology, some researchers have proposed domain generalization techniques. These techniques aim at learning a model that can generalize to an unseen target domain, given one or more related source domain data [189], [190]. These techniques hold the potential to address the challenges mentioned above in the field of SAR target recognition [191], [192], [193].

3) Source-Free Domain Adaptation

Most of the reviewed methods need to revisit the data of the source domain during the knowledge transfer. However, due to the limitation of information security or communication bandwidth, the data of the source domain may not be allowed to be accessed [194], [195]. In this scenario, only the model trained on the data of the source domain is available, which is more in line with the information security requirements of SAR target recognition tasks. Source-free domain adaptation was introduced to overcome it relying only on a well-trained source model [196], [197]. This technology has the potential to be applied to the SAR target recognition tasks.

4) Test-Time Adaptation

When the platform undergoes continuous adjustments or the target undergoes continuous azimuth changes, the distribution of the data in the target domain also changes continuously. Every time the general method is built, we need to revisit the source domain data, which takes a lot of computational resources and slows down the learning efficiency of the method. In addition, the data of the source domain may not be visited due to information security [194], [195]. Some research indicated that generalizing the model to arbitrary distribution without the data of the target domain (domain generalization) is quite difficult [198], [199], [200]. Therefore, exploring how to maintain source domain knowledge while continuously learning from the target domain for model adjustment, without the need to revisit source domain data, is a worthwhile problem [60]. This is called test-time adaptation. Different from the source-free domain adaptation, test-time adaptation aims to perform adaptation in one small data batch during testing. Test-time adaptation alternates between training and testing during the testing process [201]. This technology is expected to solve the computational resource consumption problem caused by source domain data revisiting for TAL in SAR target recognition.

5) Foundation Models

In the past few years, there has been a flourishing era of foundation models. Foundation models typically use massive amounts of data for supervised or unsupervised learning, aiming to obtain general knowledge [41]. These neural networks always consist of billions or even hundreds of billions of parameters. With the increase of model parameters and data volume, the intelligence of foundation models has shown the emergence phenomenon [202]. The foundation model leverages the general knowledge acquired from massive data. By utilizing techniques such as transfer learning and incremental learning (learning new knowledge while preserving learned knowledge) [203], the model can be quickly adapted to various downstream tasks. These tasks are involved in domains like NLP and computer vision. In this way, a novel learning paradigm is established [204], [205], [206], [207], [208], [209].

Inspired by the development of the language foundation models and vision foundation models, some researchers adapt the state-of-the-art foundation models to the downstream tasks for remote sensing [210], [211], [212], [213], [214]. There are also some researchers dedicated to building and utilizing specialized remote sensing foundation models [215], [216], [217], [218], [219], [220].

What has milestone significance is the emergence of RingMo remote sensing foundation model [218]. Remote sensing data, which covers a wide range of tens of thousands of square kilometers and contains complex scene content, differs significantly from natural images used for building general foundation models. The RingMo remote sensing foundation model can be easily customized for downstream tasks [219], [220], indicating that the foundation model can also demonstrate extraordinary capabilities in the remote sensing field.

In the reviewed literature, the source domain is already given, which limits certain scenarios of SAR target recognition. In addition, the correlation between the source domain and the target domain greatly influences the difficulty of knowledge transfer. Nevertheless, the foundation model acquires knowledge from a wide and diverse range of data, and can effectively transfer general and robust target knowledge to downstream tasks based on the transfer learning paradigm. This promises to enable excellent performance in SAR target recognition.

SECTION IX.

Conclusion

This survey offers a comprehensive overview of the latest advancements in TAL for target recognition in SAR images. Its primary aim is to consolidate knowledge and provide researchers with tools to quickly grasp the development of this field. We categorize the reviewed methods into homogeneous TAL and heterogeneous TAL, based on the relationship between the data types in the source and target domains. We further refine this categorization based on the specific data types used in the source domain and the underlying technologies. The reviewed studies suggested that TAL techniques hold tremendous potential in SAR target recognition tasks. Moreover, we examined the transferable traditional features of SAR images, which are notable for accurate target recognition based on TAL. In addition, we also reviewed the available datasets for validating the TAL method, and comparative experiments were conducted on these datasets to assess its performance. Finally, we proposed suggestions for enhancing current methods and discussed potential future research directions. Although TAL has demonstrated promising results in the field of SAR target recognition, there are still notable limitations to its application. 1) TAL relies on partially annotated data for pretraining or unlabeled data for distribution alignment and establishing a shared feature space. However, in SAR target recognition, the sample size of SAR data is often limited, hindering the full effectiveness of TAL. Furthermore, the presence of noise interference and incomplete target contours in SAR images can impact feature extraction, leading to suboptimal TAL performance. 2) TAL typically employs deep models and intricate algorithms, necessitating substantial computing resources and time. The long training and inference times associated with TAL methods may constrain their application in real-time SAR target recognition tasks, where speed and accuracy are crucial. 3) TAL is often built upon deep neural networks or other machine learning models, whose decision-making processes are often complex. In SAR target recognition, the reliability of model decisions is paramount, as incorrect identifications can have significant consequences. Therefore, ensuring the trustworthiness of TAL models is an area that requires further attention. We anticipate that future research will address these limitations of TAL methods for SAR target recognition, enabling more robust and efficient SAR target recognition systems. We also believe that the field of SAR target recognition leveraging TAL will continue to be an active and exciting area of research.

References

References is not available for this document.