Introduction
Marine oil spills often result in large-scale marine pollution and seriously endangers the marine ecosystems and environment, fisheries, wildlife and other social interests. It is of great significance to perform timely and effective marine oil spill monitoring [1]–[3]. Among the available methods, Synthetic Aperture Radar (SAR) has become the main means of oil spill detection, as it is minimally affected by sunshine and clouds, and has all-day and all-weather imaging capability with wide swath and high resolution [4], [5].
In recent years, with the development of polarimetric SAR technology, polarimetric SAR (PolSAR) images are acquired by transmitting and receiving electromagnetic (EM) waves in different polarizations [6], [7]. PolSAR imagery provides not only the scattering mechanism of sea surface but also a complex coherency matrix [8], scattering matrix [8], and other polarimetric information for oil spill detection [9]. PolSAR oil spill detection of the sea surface has become a research hotspot in recent years. Many studies have confirmed the effectiveness of PolSAR in detecting oil spills at the sea surface [10], [11]. However, the automatic extraction and selection of features of objects in PolSAR image target detection and classification have proven to be a long-term challenge [4], [12]. Based on the EM scattering characteristics and mechanism of targets, different PolSAR features of oil spill detection methods were proposed in the literature. Migliaccio et al. [13] introduced polarimetric entropy (H), mean scattering angle (a) and anisotropy (A) into oil spill detection based on target decomposition theory. Their results showed that, under low wind speed, the polarimetric features of H and A can effectively distinguish oil spills from sea water, and the polarimetric entropy has a better distinguishing effect. Later, the validity of co-polarized phase difference (CPD) in oil spill detection was proven [14]. Zhang et al. [15] calculated the conformity coefficient (
In general, the accuracy of classification is greatly dependent on the quality of extracted features. Artificial feature extraction is complex and time-consuming, which usually takes significant efforts and requires deep domain knowledge [31], [32]. In addition, the extracted features may have incomplete information coverage or redundancy [23], thus resulting in poor classification performance. In addition, different classifiers also have certain performance differences for different features or feature combinations [23], [30]. Traditional feature extraction and pixels-based classification methods are often affected by severe speckle noise, which leads to a high false alarm rate [33], [34]. These problems increase the difficulty of oil spill detection at sea [35]. It would be important to comment that another way to classify this type of data is using region-based algorithms, which classify the images using regions, not pixels, as processing unities [48], [68], [71]. These studies all prove that the spatial information can suppress speckle noises and improve the accuracy of classification.
The application of deep learning in optical image [36], [37] and SAR imagery for target recognition [34] has achieved some degree of success. The recognition technology based on deep learning has the abilities of multi-level feature expression and non-linear data fitting [38]. It can automatically mine more discriminant and representative features, and thus enhance classification performance. The Convolutional Neural Network (CNN), which is one of the widely used models in deep learning [39], can mine the spatial correlation of data to reduce the number of trainable parameters in the network. Chen et al. [40] studied the SAR target recognition by using the CNN, and proposed a full convolutional network structure (A-ConcNets) to effectively prevent over-fitting issue that comes with every Neural Network training. Zhou et al. [34] converted the complex covariance matrix of PolSAR image into a six-channel real matrix to adapt to the input of the network, and then input the six-channel data into the CNN for PolSAR classification. Gao et al. [35] proposed a novel method based on a dual-branch deep CNN (Dual-CNN) to improve the classification of PolSAR images.
Although the deep learning framework can effectively mine the rich features of PolSAR data, its research in PolSAR oil spill detection is still at its early stage, and there are issues on how to effectively integrate multi-layer deep features [41] and improve the performance of weak classifiers in CNN [42].
In view of the current research situation and exiting problems of PolSAR oil spill detection mentioned above, in this paper a novel oil spill detection method based on deep learning feature fusion and SVM is proposed. The main work includes the following three aspects:
a deep CNN network suitable for oil spill detection using PolSAR data is constructed, which consists of three convolution layers and two pooling layers. It is used to automatically extract representative deep features from PolSAR data without manual feature extraction and selection before classification.
On the basis of the traditional CNN structure, this study further extracted two high-level features from the network. Then, by means of the principal component analysis (PCA) dimensionality reduction method, the two features were fused together in order to make full use of the depth features extracted by the network. In addition, during the final stages, the features were visualized following the completion of the dimensionality reduction process, and the representativeness of the features was found to be more intuitively displayed.
Finally, the fused features are input into the SVM classifier to improve the final classification performance. The experimental results show that the algorithm performs well in terms of classification accuracy and Kappa coefficient, and can achieve good oil spill detection results.
The remainder of this paper is organized as follows: In Section 2, we briefly introduce the RADARSAT-2 PolSAR data used in this study and the proposed approach in detail. The experimental results and comparative analysis are given in Section 3. In Section 4, the depth features are visualized and we provide justification on superiority of the proposed method over the traditional methods. Finally, the conclusion is in Section 5.
Data and Methodology
A. Remote Sensing Data
Figure 1 shows the three sets of RADARSAT-2 fine-mode full-polarization polarimetric SAR images used in this study. They are SLC (single-look complex) images each with four polarimetric channels (HH, VV, HV, VH). RADARSAT-2 has a very low noise floor of about −35 dB [15]. The first two images were acquired in the Gulf of Mexico. Figure 1(a) is the first set of data, the image coverage of which is
Marine oil spill images of the RADARSAT-2 used in this study: (a) Dataset 1; (b) Dataset 2; (c) Dataset 3.
The red box in the Figure 1 is the experimental area, the size of the subsets extracted from the images in terms of pixels and more detailed data imaging parameters are shown in Tables 1–3. As can be seen from the PolSAR oil spill image of Figure 1, the oil spill shows as dark spots on the SAR image. There are also some other oceanic phenomena or targets present in the images, such as ships, ship wakes, drilling platforms, and so on.
B. Basic Framework of the CNN
In most cases, the CNN is composed by the stacking of the input layer, convolution layer, pooling layer and fully connected layer [43]. All layers are then connected in series, and the input data of each layer are the output data of the former layer. The input layer is used to receive image data. Due to its hierarchical connection structure, the CNN can extract high-level features from low-level features [44].
The convolution layer is also called the feature extraction layer. The feature map [45] is obtained by convoluting the input data. If the \begin{equation*} x_{j}^{l} =f\left ({{\sum \limits _{i\in M_{j}} {x_{j}^{(l-1)}} \ast k_{ij}^{l} +b_{j}^{l}} }\right)\tag{1}\end{equation*}
The convolution layer is typically followed by the pooling layer. The pooling layer can reduce the size of feature map and prevent over-fitting. Common pooling operations include max pooling, average pooling, and so on [47]. In this paper, we used the maximum pooling method, in which the maximum value of local windows in the feature map is selected as the output. Through the alternate combination of the convolution layer and pooling layer, more advanced features can be extracted from the original data.
C. Proposed Oil Detection Algorithm
Figure 2 shows our PolSAR oil spill classification framework. PolSAR data are preprocessed to adapt to the input of the CNN network. Then the CNN network is constructed and training samples are extracted from the data to train the CNN. Next, the trained model is used to extract the deep features of PolSAR data, and PCA is introduced to reduce the dimension of features. Finally, different high-level features are fused, and the SVM classifier with Radial Basis Function (RBF-SVM) kernel is used for classification. The details of the proposed method are described in detail in the following sections.
1) Organization Form of Polsar Data in CNN Input
Each pixel in the PolSAR data can be presented as a 2*2 scattering matrix, which contains information for describing coherence or pure scatterers [10]. Under the hypothesis of reciprocity theory, Monostatic PolSAR data can be fully represented by a symmetrical 3*3 complex coherency matrix \begin{equation*} {\boldsymbol T}_{3} =\left [{ {{\begin{array}{ccc} {T_{11}} & {T_{12}} & {T_{13}} \\ {T_{21}} & {T_{22}} & {T_{23}} \\ {T_{31}} & {T_{32}} & {T_{33}} \\ \end{array}}} }\right]\tag{2}\end{equation*}
\begin{align*}t=&[T_{11},T_{22},T_{33},Re(T_{12}), \\& \qquad \text{Im}(T_{12})\text{Re}(T_{13}),\text{Im}(T_{13}),\text{Re}(T_{23}),\text{Im}(T_{23})] \tag{3}\end{align*}
2) Network Structure
In PolSAR image classification, the most commonly used CNN is Lenet-5 or its improved models [32]. Other very deep CNNs, such as AlexNet [36], VGG-Net [49], and Res-Net [50], are suitable for large-scale input images. For each network, a large number of samples are required to train the parameters in the network. We divided PolSAR images into a large number of input image patches as samples, and randomly selected training samples for CNN construction and validation. The input patch size is generally small, such as
The first two convolution layers consist of 30 and 60 convolution kernels with a size of
After the forward propagation mentioned above has been completed, the loss function is then calculated based on the predicted and true value. In this study, we used cross-entropy as the loss function. The formula is as follows:\begin{equation*} L=-\frac {1}{n}\sum \limits _{i=1}^{N} {\sum \limits _{k=1}^{n} {[y\ln a+(1-y)\ln (1-a)] }}\tag{4}\end{equation*}
Next, the network parameters are updated iteratively according to back propagation (BP) algorithm. In BP, the stochastic gradient descent (SGD) algorithm is used to update the network parameters along the opposite direction of the gradient of the objective function to make the objective function converge. However, the stochastic gradient descent algorithm is prone to oscillation near the local extremum, resulting in a slow convergence rate. In order to improve the performance of BP, we used the Adadelta optimization algorithm [53] in the process of gradient descent. This method adjusts the learning rate adaptively and obtains a predicted result quickly. The batch size is set to 64.
3) Extraction and Fusion of Deep Features
For the entire network, each convolution and pooling layer can be regarded as a feature extraction layer. For the new input data, the trained CNN model can automatically extract high-level features from the data. In this paper, the feature map after the third convolution layer is extracted and stretched to a 1-D vector, recorded as
4) Classification of Deep Features Using RBF-SVM
Based on the network constructed in this paper, the CNN is used to extract deep features from the PolSAR images. Then we use fully connected layer and softmax layer to classify the features. This method can effectively update the parameters in the training phase by combining BP. However, Softmax is not very effective in non-linear classification [55]. Therefore, we introduce RBF-SVM into the classification scheme to improve the classification performance of the top-layer classifier of the CNN network structure [56], and at the same time, avoid the problem of over-fitting to a certain extent [57].
SVM aims at minimizing structural risk. It maps sample vectors to high-dimensional or even infinite-dimensional feature space (Hilbert space) through non-linear mapping, then constructs the optimal hyperplane as the decision plane in high-dimensional feature space [24]. In SVM, the non-linear separable problem in the original sample space is transformed into a linear separable problem in the feature space. Therefore, SVM performs well in solving the non-linear classification problem.
Assuming \begin{align*}&\textrm {min}\frac {1}{2}\left \|{ w }\right \|^{2}+C\sum \limits _{i=1}^{n} {\xi _{i}} \\&s.t. \begin{cases} {y_{i} (w\cdot \varphi (x_{i})+b)\ge 1-\xi _{i}} \\ {\xi _{i} \ge 0} \\ \end{cases};\quad i=1,2,\ldots,n\tag{5}\end{align*}
\begin{align*}&\mathop {\textrm {max}}\limits _{\alpha } \sum \limits _{i=1}^{n} {\alpha _{i}} -\frac {1}{2}\sum \limits _{j=1}^{n} {\sum \limits _{i=1}^{n} {\alpha _{i} \alpha _{j} y_{i} y_{j} k(x_{i},x_{j})}} \\&s.t. \begin{cases} {\sum \limits _{i=1}^{n} {\alpha _{i} y_{i} =0}} \\ {0\le \alpha _{i} \le C} \\ \end{cases};\quad i=1,2,\ldots,n\tag{6}\end{align*}
Expermental Results and Analysis
A. Experimental Settings
In order to verify the effectiveness of the algorithm, several comparison experiments were performed. These experiments adopted three classic methods: Maximum Likelihood (ML); SVM; and Neural Networks (NN). The experiments were simultaneously performed on two types of input data. In addition, several of the most commonly used polarization features (PF) utilized in oil spill detection processes were selected as input, which included span [13], H,
In the present experimental study, in accordance with the network structure design described in Section 2.3.2, the areas surrounding the pixels were selected as the input image patches. The image patched measured
B. Evaluation Metrics
In the present study, in order to examine the performances of different methods, the experimental results were assessed by three evaluation metrics, namely the overall accuracy (OA), Kappa coefficient, and F1-measure. It was found that the OA could be calculated by the ratio between the number of correctly classified samples and the total number of samples. Therefore, the OA was defined in this study as follows:\begin{equation*} OA=\frac {M}{N}\tag{7}\end{equation*}
\begin{equation*} Kappa=\frac {OA-P}{1-P},P=\frac {1}{N^{2}}\sum \limits _{i=1}^{C} {m_{i} k_{i}}\tag{8}\end{equation*}
\begin{equation*} F1-measure=2\times \frac {Precision\times Recall}{Precision+Recall}\tag{9}\end{equation*}
C. Comparison and Analysis of Results
1) Experimental Results of Dataset 1
Fig. 6 shows the accuracy and loss curves during the training process of Dataset 1. In the figure, the horizontal axis represents the number of epochs, and the vertical axis denotes the accuracy and loss values. In addition, for Dataset 1, the increases in precision and decreases of the loss values during the early stages of the network training were very fast. The network accuracy was approximately 98%, and the loss was approximately 0.05. The network had reached a stable state after the first few epochs. Subsequently, the trained network was used to classify the entire image. The results of the method which was adopted in this study, along with other comparative experimental results, are shown in Fig. 7.
Fig. 7 intuitively shows that this study’s proposed method had displayed high purity results in regard to visual effects. There were fewer spots on the surface of the sea, and the amount of seawater misclassified as oil spills had been greatly reduced when compared with the results of the other examined methods. As detailed in Table 8(where the brackets after the OA in the table show the variance of the classification accuracy), from a quantitative perspective, it was confirmed that the proposed method had achieved the highest results on all three indicators (OA, Kappa coefficients, and F1-measure).
When the traditional classification methods were used, it was found that regardless of whether T3 or PF was used as the input, there were generally only minimal differences observed in the obtained classification results. Furthermore, many spots were still visible on the sea surface and the overall accuracy was less than 98%. Also, the Kappa coefficients were all less than 0.9. In particular, the results of T3-SVM were observed to be the least accurate. It was found that the classification accuracy of the PFs as input in Dataset 1 was slightly lower than that when the T3 was the input. As mentioned above, traditional classification methods rely on the quality of feature extraction, which can potentially lead to different performance rates for the various feature combinations. However, when compared with directly using the original data for oil spill detection, the method using PFs as the input was based on the target electromagnetic scattering characteristics and the prior knowledge of the model. The performances of each of the classifiers tended to be stable, without many variations observed in the classification results. Nevertheless, due to the insufficient or redundant information covered by the different features or feature combinations, as well as other false identifications (such as the drilling platform in the upper right corner of the image), some objects were incorrectly identified as oil spills. In addition, the traditional methods had displayed poor edge recognition effects for the oil spills, as indicated by the red circle in Fig. 7.
It has been found that when compared with traditional methods, the CNN-based methods have the ability to more effectively improve detection performances. The spatial information is well presented and extracted through convolution and pooling operations. Therefore, it was clearly evident that the T3-CNN and PF-CNN methods significantly reduced the impacts of the sea speckle noise, which subsequently improved the powerful feature learning ability of the CNN. More importantly, the T3 had identified the most vital information via the CNN network. Therefore, the results of the T3-CNN were found to be slightly better than those of PF-CNN.
Furthermore, it was successfully confirmed that the method proposed in this study had shown the ability to accurately extract and fuse the two high-level features of the CNN. The method also adopted an SVM classifier to replace the weaker Softmax classifier in the CNN, which increased its ability to solve nonlinear problems. It was determined from the detection results that the proposed method’s performance was superior to the other examined methods, and the OA was further improved by approximately 1% when compared to the results obtained using the CNN-based method. In addition, when compared with the traditional methods, the OA, Kappa coefficients and F1-measure had been remarkably improved by the proposed method. This was particularly true for the Kappa coefficients. Also, using the results of this study’s comparison between the T3-SVM and the proposed method, the effectiveness of the feature extractions of the CNN method was further validated.
Table 9 shows the classification accuracy using different principal component combinations following the CNN feature extraction. It was observed that following the dimension reduction by PCA, the time used had been significantly reduced. Then, with the principal component increases, the classification time also significantly increased. Thereby, while considering the trade-off between the time cost and the accuracy, the first three principal components were selected as the fusion for the classification process in this study.
Fig. 8 presents the different performances among the T3-CNN-SVM, T3-CNN-ML, T3-CNN-NN, and T3-CNN-KNN (K-Nearest Neighbor algorithm) in this study’s experiments. During the experimental processes, K was set as 7, and different classic classifiers were used to replace the Softmax of the CNN. The results demonstrated that the RBF-SVM based method had displayed the highest classification accuracy. Meanwhile, when compared with the traditional classification results of the T3-SVM, T3-ML, and T3-NN (Table 8), it was observed that the classification effects of the T3-CNN-SVM, T3-CNN-ML, and T3-CNN-NN had been notably improved. These results indicated that the classification effects could potentially be improved by using the features extracted from the CNN, rather than by directly using the original data or features. Moreover, the CNN-based feature mining processes could stably enhance the different classifiers in order to reach higher accuracy levels, as well as reducing the performance differences between each type of classifier.
The classification results obtained by using different classifiers after CNN feature extraction and fusion of Dataset 1: (a)T3-CNN+SVM; (b)T3-CNN+ML; (c)T3-CNN+NN; (d)T3-CNN+KNN.
2) Experimental Results of Dataset 2
Fig. 9 shows the accuracy and loss curves during the training process of Dataset 2. It can be seen in the figure that a small fluctuation had occurred in the pre-training period. However, after 20 epochs, the curves had basically converged. The classification results and detection accuracy of Dataset 2 are shown in Fig. 10 and Table 10, respectively.
Accuracy and loss curves of the Dataset 2. Note: In the figure, train_loss/acc indicates the accuracy and loss curves on the training samples; and val_loss/acc indicates the accuracy and loss curves on the validation samples.
The oil spill observed in Dataset 2 was generally presented as a slender type oil spill. As can be seen from the classification results in Fig. 10, this study’s proposed method still achieved the best results on this dataset, with OA, Kappa coefficients, and F1-measure of 98.89%, 0.8814, and 0.9423, respectively. It was also observed that the traditional methods had displayed a large number of speckles on the sea surface, especially in T3-ML. Although it was found that the T3-NN method performed well on the sea surface, it had displayed poor preserving abilities on the overall shapes of the oil spills, particularly in regard to the edges of the oil spills. It should be noted that the PFs were built on the scattering characteristics of the objects, which involved both stable and robust information leading to the stable performances of each classifier [18], [23].
In regard to the CNN-based method, it was observed that there were also some speckles on the sea surface, which may have been caused by the failure to effectively use a variety of high-level features and the insufficient performance of the top classifier. However, the edges of the oil spills remained relatively complete when compared to the results obtained using the classic method. In contrast, the method proposed in this study had effectively reduced the sea speckles. In addition, there were no false identifications as had appeared in the results of the other examined methods, which indicated that the fusion features in the CNN had displayed strong discrimination and representativeness abilities. Also, in accordance with the quantitative accuracy evaluation shown in Table 10, the accuracy of the proposed method was determined to be 98.89%, with 1.3% increment compared to that of the T3-CNN. Meanwhile, the Kappa coefficient was also significantly improved to 0.8814. Therefore, as proven by the results achieved using the method proposed in this study, its performance was found to be stable and superior to the other examined methods.
Table 11 details the classification accuracy and time costs using different principal component combinations. Similar regulation results as Dataset 1 were observed. Therefore, in the present study, the first three principal components were recommended as the fusion features.
Fig. 11 provides the classification results obtained using the different classifiers following the CNN feature extraction and fusion of Dataset 2. It can be seen in the figure that the method based on RBF-SVM had once again achieved the highest classification accuracy.
The classification results obtained by using different classifiers after CNN feature extraction and fusion of Dataset 2: (a)T3-CNN+SVM; (b)T3-CNN+ML; (c)T3-CNN+NN; (d)T3-CNN+KNN.
3) Experimental Results of Dataset 3
The accuracy and loss curves during the training processes of Dataset 3 are shown in Fig. 12. There were more types of bio oil film in Dataset 3 than in the first two datasets. Therefore, there were certain differences between the training set and the test set curves. This may be related to the partitioning of the sample data, resulting in the model’s performance on the training set is slightly better than the validation set. However, the overall trend was consistent. It was found that as the epoch increased, both the accuracy and the loss curves tended to be stable. Fig. 13 and Table 12 detail the classification results and detection accuracy of Dataset 3, respectively.
Accuracy and loss curves of the Dataset 3. Note: In the figure, train_loss/acc indicates the accuracy and loss curves on the training samples; and val_loss/acc indicates the accuracy and loss curves on the validation samples.
Dataset 3 contained three substances released by humans: crude oil, emulsified oil, and plant oil. In the current study, plant oil was used to simulate a natural biogenic slick. This experiment was mainly used to verify the ability of the proposed method to distinguish oil spills from biogenic slick. Biogenic slick refers to the natural surface slick formed by organic substances secreted by marine fish and some types algae plants [59]. It also is indicated by dark spots on SAR images, which makes it difficult to distinguish an oil spill from biogenic slick using traditional SAR image classification methods. Fig. 13 and Table 12 show the results and accuracy of the different methods, respectively. Once again, it was found that the proposed method had achieved the best performance.
For the three traditional methods, when the T3 matrix was directly used as an input, the oil spills could not be distinguished from the biological oil slicks. It was observed that almost all the oil film in the SAR images had been classified as oil spills. Meanwhile, a large number of misclassifications appeared on the sea surface, and the detection results were generally very poor. When the PF was used as the input, the three methods (PF-ML, PF-SVM, and PF-NN) were found to effectively distinguish the oil spills from the biological oil slicks, and the effects were noticeably improved. However, these Pol-SAR features possessed poor noise resistance, which tended to still induce many misclassifications of the sea surface objects. The T3-CNN and PF-CNN had extracted the spatial information of the PolSAR images and reduced the false identification rates of the sea surface objects. However, it was clear that the proposed method had the best detection results, including the highest OA, Kappa coefficients, and F1-measure of 97.56%, 0.7795, and 0.8005, respectively. When compared with the other methods examined in this study, the newly proposed method had not only significantly reduced the speckles on the sea surface, but had also shown the ability to accurately distinguish the biogenic slicks from the oil spills. In addition, the proposed method had also reduced the misclassification phenomena associated with the biogenic slicks. The detailed result information is shown in Fig. 15.
The classification results obtained by using different classifiers after CNN feature extraction and fusion of Dataset 3: (a)T3-CNN+SVM; (b)T3-CNN+ML; (c)T3-CNN+NN; (d)T3-CNN+KNN.
Details of marine oil spill detection results of Dataset 3: (a) PF-SVM; (b) T3-CNN; (c) Proposed method.
In addition, it should be noted that the edge area of the two oil films located in the middle and bottom right of the figure were wrongly divided into biological oil film. As mentioned in the previous related literature [17], the outermost parts of mineral oil spills may form thinner slicks over time. Therefore, the red zone around the edges of the emulsion and crude oil may be thinner film with properties similar to plant oil slicks. This could potentially explain the “misclassification” phenomena observed in Fig. 13. In future research, the addition of a super-pixel segmentation method, the region-based CNN [65] or conditional random fields (CRF) [66] should be considered. Also, the multi-scale spatial information of the oil spill edge regions should be thoroughly investigated in order to optimize the classification results [71].
The results of using different combinations of principal components for Dataset 3, along with the classification results obtained using the various classifiers following CNN feature extraction and fusion, are shown in Table 13 and Fig. 14, respectively. It was found that, similar to Datasets 1 and 2, the usage of the first three principal component combinations and the SVM classifier had achieved the recommended results.
Discussion
A. The Visualization of the Deep Features
According to the forementioned experimental results, the proposed approach shows great advantages in PolSAR oil spill classification. In this section, we visualize the deep features and discuss their superiority. After training the model, the feature centered at the pixel
First principal component graph of deep features after PCA dimensionality reduction. (a) First principal component of fea_1; (b) First principal component of fea_2.
As can be seen from Figure 10, the deep features obtain a strong identification between the oil spill and sea water with no other false alarms. Compared with the polarimetric features extracted in Figure 5, the deep features exhibit a superior visual effect, because it takes the neighborhood information into account and reduces the speckle noise. In addition, most of the polarimetric features have similar responses to the oil spill and drilling platform in the right top in Figure 5, which may be the cause of the false alarms produced by the PF-SVM method. However, no such phenomenon has been found in the deep features, which proves that the two deep features extracted by the CNN have good discrimination and robustness, thereby proving the effectiveness of this method.
B. Discussion on Detection Rate and False Alarm Rate
In oil spill detection, the oil film is often confused with look-alikes, and affected by the speckle noise of polarimetric SAR imaging. Which often leads to more false alarms on the sea surface. Therefore, it is necessary to consider the accuracy of the detection results and false alarms rate (FAR). The FAR can reflect the proportion of negative examples among the samples classified by the classifier as positive examples. The results are discussed in further detail here. The results of accuracy and FAR for each category of Datasets 1 to 3 are shown in Tables 14 to 16.
There are only two types in datasets 1 and 2, namely seawater and oil spills. For dataset 1, we can see that the most accurate detection result for oil spill is T3-SVM, but it introduces a lot of false alarms, and the FAR is as high as 70.65%. The extreme case of this phenomenon is equivalent to dividing an entire image into oil spills. At this time, the detection rate is 100% for oil spills, but it is conceivable that the FAR will also increase significantly, which can not be used in actual work. The method proposed in this paper can detect the oil spill well and reduce the false alarm of the sea surface greatly. It achieves a very good balance between the detection rate of oil spill and the FAR.
As can be seen from the results of dataset 2, compared with the classical method, the method based on CNN greatly improves the accuracy of oil spill detection, and at the same time reduces false alarms on the sea surface. The method in this paper further reduced the false alarm on the sea surface while maintaining the detection rate, and achieved the lowest false alarm rate. At the same time, it achieved the best accuracy on other indicators.
Dataset 3 contains three classes, therefore, the false alarm of biological oil slicks is also considered. The results are shown in Table 16. The classical method based on T3 matrix cannot distinguish between biological oil film and oil film, and a large number of false alarms are introduced in the sea surface, the FAR of oil spill and biological oil slicks is higher than 80%. PF-ML, PF-SVM, and PF-NN can distinguish three types of substances, but there are more false alarms of biological oil slicks on the sea surface. Compared with the classical method, the proposed method greatly reduces the sea false alarm. Compared with CNN method, the proposed method has the best accuracy in other indicators while maintaining the detection rate, and also reached the lowest FAR.
In general, through the above analysis, the proposed method in this paper can achieve a good compromise between detection rate and FAR. At the same time, the performance of OA, kappa, F1-measure and other indicators are better than other methods.
Conclusion
In view of the drawbacks of traditional offshore manually-driven oil spill detection methods with insufficient or redundant features, we proposed an oil spill detection method based on the fusion of deep learning features. The main contributions of this paper are summarized as follows:
A deep CNN suitable for oil spill detection is constructed to automatically extract identifiable deep features from the PolSAR data. In view of the problem that general deep learning methods cannot fuse multi-layer deep features, we extract two high-level features of the CNN network, then reduce the dimension of the two high-level features by means PCA, and finally fuse the features so as to include abundant target feature information. By comparing the visualization of high-level features with the traditional polarimetric features, we show that the features extracted automatically by this method exhibit superior representativeness and robustness.
The original Softmax classifier of the CNN network is replaced by the robust SVM classifier with an RBF kernel, which enhances the robustness of the classifier and the ability to solve non-linear problems. Therefore, we further improve the classification performance and enhance the ability to identify oil spills.
The oil spill and biogenic slick detection experiments carried out in this paper prove that, compared with other methods, this method can effectively reduce the false alarm rate caused by sea clutter or biogenic slick, and significantly improve the accuracy of oil spill detection.
However, in practical applications, due to the complexity of the sea surface environment, there are still unknowns in using deep learning for oil spills detection under various sea conditions. The proposed method is pixel-based and its calculation efficiency is not high. In future work, other deep learning frameworks should also be considered, such as FCN [60], Segnet [61], Resnet [50], and so on. In addition, research on feature level fusion and decision level fusion among different models can be carried out to further improve the accuracy and generalization performance of the models. In conclusion, the proposed method and experimental results confirm that the deep learning network has strong application potential in PolSAR oil spill detection. This study provides a reference for using deep learning for PolSAR oil spill detection in the future.