Introduction
Approximately three quarters of the earth’s surface is covered by sea, which plays an important role in shaping global climate. Specifically, the sea surface temperature (SST) has great effects on many environmental issues (e.g., climate change, marine disasters, and ocean acidification) and environmental phenomena (e.g., ocean currents and El Niño-Southern Oscillation). Therefore, accurately predicting SST is of practical significance and could benefit many environment-related research activities and applications.
However, it is difficult to predict SST, which dynamically changes over time and space. On the one hand, during the day time, SST changes according to the position of sun, which is one major factor affecting SST by heating the water. On the other hand, different sea areas usually have different SST records due to the difference in radiation and evaporation efficiency.
Up to now, various approaches have been proposed for SST prediction. Existing SST prediction approaches can be roughly divided into three categories, i.e., physical models, traditional machine learning models, and deep learning models.
Physical models: Typical physical models, e.g., general circulation model (GCM) model [1], CMIP5 model [2], and MyOcean Project model [3], are based on the physical theories to model the spatial and temporal dependency in SST sequence and require domain knowledge to design the models.
Traditional machine learning-based models: Machine learning models, including Markov model [4]–[6], regression model [7], [8], and support vector machine (SVM) [5], [9]–[11], do SST prediction by training models using both spatial and temporal features extracted from the historical SST data.
Deep learning-based models: Deep learning methods have also been applied for predicting SST due to their strong predictive power in handling temporal and spatial data [12], [13]. For example, long short-term memory (LSTM) model is used to predict SST by taking the advantage of the time dependency in SST sequence, and convolutional neural network (CNN) model treats SST data as a 2-D map and predicts SST by learning the spatial dependency between different regions.
Existing approaches for SST prediction have already achieved good prediction performance with a small deviation error. For example, the mean absolute error (MAE) of MyOcean is only about 0.5. However, some critical application scenarios, such as coral bleaching and weather prediction, still require a more accurate prediction for SST to better solve their problems. For example, corals are highly sensitive to the temperature and a small change in SST could result in coral bleaching. Therefore, this work aims to take the advantage of the correlations between spatial dependency and temporal dependency in SST to further improve the accuracy of SST prediction, which have not yet been considered by the existing approaches.
To address the abovementioned issue, we propose a dense dilated convolutional LSTM (D2CL) model that can effectively learn both the spatial and temporal features of SST for more accurate SST prediction by combining the dilated convolutional network and LSTM. The dilated convolution can effectively learn spatial and temporal correlations in SST and learn features of multiple granularities. Meanwhile, the dense structure could help to minimize the information loss when training the prediction model.
The main contributions of this article are threefold as follows.
We proposed the D2CL model that can learn the spatial and temporal features from historical SST data simultaneously by integrating dilated convolutional network and LSTM.
We developed multiple feature extractors of different dilated kernel sizes to learn the features of multiple scales and introduced dense connection to maximize information transfer between model layers.
We conducted extensive experiments over two real SST datasets to demonstrate the effectiveness of the proposed model and the enhancing techniques.
The rest of this article is organized as follows. We review the related work in Section II and Section III gives the problem definition of SST prediction. The details of the D2CL model are presented in Section IV. Section V evaluates the proposed model via experiments, followed by a conclusion of this work in Section VI.
Related Work
This section presents a detailed review on the existing SST prediction methods, which can be roughly grouped into three categories, i.e., physical models, traditional machine learning-based models, and deep learning-based models.
A. Physical Models
Physical models use Newton’s laws of motion, law of conservation of energy, and seawater equation of state, etc., to predict SST. Typical physical models include GCM [1], Coupled Model Intercomparison Project (CMIP) model [2], and MyOcean Project model [3]. The GCM model is based on the Navier–Stokes equation, combined with the solar radiation, ocean latent heat, and other ocean dynamics parameters, to simulate the changes of SST. The CMIP model is an ensemble model that integrates multiple physical models to achieve more accurate prediction for SST. MyOcean integrates different data sources to produce an analyzed dataset based on physical models.
Physical models are usually of high complexity and interpretability, but require a good understanding of the dynamic mechanism of SST. However, SST is determined by many factors, which makes it difficult to learn and monitor such dynamics.
B. Traditional Machine Learning-Based Models
Different from the physical models, machine learning models are based on the probability theory and predict SST by learning the underlying patterns from the historical SST data. Major machine learning models for SST prediction include Markov model [4], pattern searching model [6], regression model [7], [8], and SVM model [5], [9]–[11].
Markov model: Xue et al. [4] proposed a seasonally varying Markov model, constructed in a multivariate empirical orthogonal function (EOF) space of the observed SST and sea level, for predicting the SST in tropical pacific.
Pattern searching model: Agarwal et al. [6] developed a pattern searching model to find similar temporal cycles and select the best match based on the interseasonal changes to predict SST.
Regression model: Laepple et al. [7] introduced a regression model for SST prediction and found that there exist correlations between SST and hurricanes. Kug et al. [8] also developed a regression-based model to predict SST dynamically.
SVM model: Lins et al. [9], [10] combined SVM and particle swarm optimization to predict the SST across Northeastern Brazilian coast. Aguilar et al. [5], [11] introduced warm water volume as an additional feature to enhance SST prediction with an SVM model.
Traditional machine learning models mainly focus on extracting the temporal features of SST and cannot well capture the spatial features.
C. Deep Learning-Based Models
Classic neural networks, e.g., feed-forward neural network (FNN) and artificial neural network, have been applied for predicting SST ten years ago. For example, Tripathi et al. [14] predicted the SST in the Indian Ocean with the FNN model. Tangang et al. [15] introduced wind-stress and SST anomalies for seasonal SST prediction with the FNN model. Garcia-Gorriz et al. [16] used meteorological variables (e.g., sea level pressure, air temperature, and wind) as inputs and conducted SST prediction in the western Mediterranean Sea with the FNN model. Wu et al. [11] proposed a nonlinear model that uses a multilayer perception neural network to predict the SST in tropical Pacific.
Recently, deep learning methods have also been applied for SST prediction. Some studies aim to capture the temporal features in SST sequence. For example, Zhang et al. [12] used the LSTM model to predict the SST in Bohai; Patil et al. [17] proposed the wavelet neural network for the prediction of daily SST; Yang et al. [18] combined the Markov random field with LSTM to predict the SST in Bohai. Some other studies aim to capture the spatial features in the SST records of different regions. For example, Zheng et al. [13] proposed to use the CNN model for SST prediction, and Xiao et al. [19] proposed the convolutional LSTM model for SST prediction.
Most existing deep learning methods for SST prediction learn either the temporal features or the spatial features in the SST data. However, they fail to consider the correlation between the two types of features, which inevitably reduces the prediction accuracy. In contrast, our D2CL model can learn both the spatial and temporal features of SST, thus achieving higher prediction accuracy than the exiting methods.
Problem Definition and Notations
For SST prediction, the sea surface is usually divided into grid regions of the same size according to longitude and latitude. The SST records of each grid region is obtained by the observation equipments in the grid. All the grid regions form an
Example 1:
illustrates the East China Sea (ECS) located withinIn practice, we usually try to learn knowledge from a period of SST records in history and use the learned knowledge for predicting the SST in future. In this work, we also follow this principle and formally define the problem of SST prediction in the following.
Definition 1 (SST Prediction):
Given
\begin{equation*}
\begin{aligned} X_{t+1},\ldots,X_{t+v} = \mathop {\arg \max }_{X_{t+1},X_{t+2},\ldots,X_{t+v}} \ \ \\
p(X_{t+1},\ldots,X_{t+v}|X_{t-u+1},X_{t-u+2}...,X_{t}) \end{aligned} \tag{1}
\end{equation*}
For example, given the SST records in the past 30 days, we could predict the SST records in next seven days, where
The Nomenclature summarizes the notations used in this article.
Methodology
A. Model Architecture
Fig. 3 illustrates the architecture of D2CL, which is an encoder–decoder model from the global view. D2CL receives as input the historical SST records and predicts the SST records in future. D2CL consists of dilated ConvLSTM layers, each of which contains stacked dilated convolutional blocks (DC blocks) of different dilation rates. In addition, the layers of D2CL are densely connected to maximize information transfer between layers.
Architecture of the D2CL model, where the color indicates the difference in dilation rate.
B. Encoder and Decoder
In this work, we developed an encoder–decoder model to compress the historical SST sequence
As illustrated in Fig. 3, the encoder consists of
C. Dilated ConvLSTM Layer
Existing SST prediction models usually use the single-scale feature extraction and cannot well learn the hidden features, thus leading to low prediction accuracy. To solve this limitation, each dilated ConvLSTM layer in D2CL uses three feature extractors, i.e., dilated ConvLSTM (DC) blocks, of different dilation rates to extract multiscale hidden features.
1) Dilated ConvLSTM Block
Each DC block consists of batch normalization (BN), dilated ConvLSTM, and rectified linear unit (ReLU). The BN operation normalizes the data to reduces the occurrence of overfitting. The dilated ConvLSTM operation learns features from the data with a large receptive field. The ReLU operation filters the negative values of the data to improve model efficiency.
Fig. 4 illustrates a 2-D dilated convolutional operation constructed by inserting “holes” (zeros) between pixels, corresponding to the grid regions, in the convolutional kernel. Fig. 4(a), (b), and (c) shows the normal convolutional kernel, the convolutional kernel with one hole, and the convolutional kernel with two holes, respectively. Generally, for a convolutional kernel of size
\begin{equation*}
k_d = k + (k-1)(r-1) \tag{2}
\end{equation*}
The dilated convolutional operation is denoted as
\begin{align*}
Y= X \ast _l W_{k_d} \tag{3}
\end{align*}
The structure of the dilated ConvLSTM operation is illustrated in Fig. 3 DC block. It consists of multiple sequential dilated ConvLSTM cells. The output of the encoder is the output
Example 2:
We use the historical SST data of 30 days to predict the SST records within next seven days, i.e.,
The structure of the dilated ConvLSTM cell is illustrated in Fig. 5, where forget gate
The gates, i.e., input gate, forget gate, and output gate, and internal state in the dilated ConvLSTM are formulated by the following equations:
\begin{eqnarray*}
i_t &=&\sigma \left(W_{xi}\ast _l X_t+W_{hi}\ast _l h_{t-1}+W_{ci}\circ C_{t-1}+b_i\right) \\
f_t &=&\sigma \left(W_{xf}\ast _l X_t+W_{hf}\ast _l h_{t-1}+W_{cf}\circ C_{t-1}+b_f\right) \\
\tilde{C}_t &=& \text{{tanh}}{\left(W_{xc}\ast _l X_t+W_{hc}\ast _l h_{t-1}+b_c\right)} \\
C_t &=&f_t\circ C_{t-1}+i_t\circ \tilde{C}_t \\
o_t &=&\sigma \left(W_{xo}\ast _l X_t + W_{ho} \ast _l h_{t-1} + W_{co} \circ C_t + b_o \right) \\
h_t &=&o_t\circ \text{{tanh}}\left(C_t\right)
\end{eqnarray*}
2) Multiscale Feature Extraction
Fig. 6(a) presents the feature extraction ranges of three DC blocks
Feature extraction ranges of
To solve this issue, DC blocks should have different dilation rates to make the receptive fields of a series of dilated convolutions fully cover all the grids in the last block
D. Dense Connection
A drawback of the existing deep learning models for SST prediction is that the gradient cannot transmit directly from the later layers to the earlier layers. This may prevent the information transmit in the model and further affect the prediction performance.
To address this issue, we build direct connections from each layer to all the subsequent layers. By doing this, the dilated ConvLSTM layer
\begin{equation*}
I_{L_m} = O_{L_0} \oplus O_{L_1} \oplus \ldots \oplus O_{L_{m-1}} \tag{4}
\end{equation*}
According to Fig. 3, each dilated ConvLSTM layer contains three DC blocks. Therefore, the output of layer
\begin{equation*}
O_{L_m} = O_{B_{m,1}} \oplus O_{B_{m,2}} \oplus O_{B_{m,3}} \tag{5}
\end{equation*}
Example 3:
Assume that we use the historical SST data of 30 days to predict the SST records in next seven days, and the raw historical SST data are
With the dense connection mechanism in D2CL, we can maximize the information transfer between the model layers and further improve the prediction performance.
Experiments
Experiments over two real datasets have been conducted to evaluate the effectiveness of the proposed model. We first analyze the impact of major parameters on the performance of D2CL. Then, we compare D2CL with two baseline approaches, i.e., LSTM, CNN, and ConvLSTM, and one physical approach, i.e., MyOcean, to demonstrate its superiority.
A. Datasets
In our experiments, we use the Optimum Interpolation SST (OISST) dataset from National Oceanic and Atmospheric Administration (NOAA).1 The OISST data contain the daily SST records from 1981 to 2015. Specifically, we selected those SST records in ECS and South China Sea (SCS) to evaluate the D2CL model.
ECS data: The ECS covers the area of
,[23^\circ 00^{\prime } N\text{--}33^\circ 10^{\prime } N , and is divided into122^\circ 50^{\prime } E\text{--}129^\circ 50^{\prime } E] grids of size40 \times 30 .0.25^\circ \times 0.25^\circ SCS data: The SCS covers the area of
,[11^\circ 55^{\prime } N\text{--}21^\circ 50^{\prime } N , and is divided into111^\circ 00^{\prime } E\text{--}117^\circ 50^{\prime } E] grids of size40 \times 26 .0.25^\circ \times 0.25^\circ
B. Training and Experimental Settings
According to the settings in the existing approaches [12], we also set
The D2CL model is trained to predict
\begin{equation*}
\text{Loss} =||X_v - D(E(X_u))||^2 \tag{6}
\end{equation*}
The SST records in both the ECS and SCS data are split into three subsets: 60% for training, 30% for validation, and 10% for testing. We use the mean squared error (MSE) function as the loss function and set the number of training iterations to epoch
All the experiments are conducted on a 64-core Intel Xeon processor with 512GB RAM and 2 NVIDIA RTX 2080 Ti GPUs. The D2CL model is implemented based on TensorFlow 1.13.0. Although it takes hours to train the model, D2CL can run in real time to produce the predicted results once trained.
C. Evaluation Metrics
We use four evaluation metrics, i.e., MSE, root-mean-squared error (RMSE), MAE, and mean-absolute-percentage error (MAPE), to measure the performance of the SST prediction models. Let
\begin{align*}
\text{MSE} =& \frac{1}{n} \sum _{i}(x_i - \hat{x}_i)^2 \tag{7}
\\
\text{RMSE} =& \sqrt{\frac{1}{n} \sum _{i}(x_i - \hat{x}_i)^2} \tag{8}
\\
\text{MAE} =& \frac{1}{n} \sum _i |x_i - \hat{x}_i| \tag{9}
\\
\text{MAPE} =& \frac{100\%}{n}\sum _{i}|\frac{x_i-\hat{x}_i}{x_i}| \tag{10}
\end{align*}
D. Varying the Kernel Size
The kernel size has a great influence on the receptive field. The larger the receptive field is, the more features can be extracted. However, if we want to get more subtle features, we need to narrow the receptive field, i.e., reducing the kernel size. Table II gives the results of D2CL for different kernel sizes, where the best results are highlighted by boldface. The D2CL model with kernel size
E. Varying the Number of Layers in Encoder
Table III presents the results of D2CL while varying the number of layers in the encoder, where the best results are highlighted in bold. Compared with the D2CL of one layer, the D2CL with two layers decrease 0.02, 0.01, 0.01, and 0.11 on MSE and MAPE, respectively. However, the D2CL of three layers achieves worse performance than the D2CL of two layers.
In general, increasing the depth of the model helps to improve the prediction’s accuracy. However, with the increase of depth, the problem of gradient disappearance will emerge and affect the updates of model parameters. In addition, stacking more than three layers, D2CL costs more computation resources. Therefore, in the following experiments, we choose the D2CL model with two layers in default.
F. Effectiveness of Enhancing Techniques
To validate the effectiveness of the proposed enhancing techniques, i.e., dense connection (D), triple extractors (T), and dilated convolution (C), we evaluate the performance of D2CL without these techniques. According to Table IV, after removing
G. Model Comparison
Table V tabulates the results of LSTM [12], CNN [13], ConvLSTM [19], and D2CL on the ECS data. D2CL outperforms the other three models in terms of all four evaluation metrics. Specifically, the MSE of D2CL is only 0.39 while that of LSTM, CNN, and ConvLSTM are 0.48, 0.62, and 0.63, respectively.
Similarly, Table VI tabulates the results of LSTM, CNN, ConvLSTM, and D2CL among which D2CL achieves the best performance.
Furthermore, we compare the performance of CNN, ConvLSTM, and D2CL while varying the number of days to be predicted. Figs. 7 and 8 present the results of three models in the ECS data and SCS data, respectively. According to results in both figures, the performance of all the three models decreases with the increase of the number of days. D2CL achieves the best performance when predicting SST for the next two or more days. For the one day prediction, however, ConvLSTM is better than D2CL. This is because D2CL can take the advantage of the spatio-temporal relationships among those days to be predicted for achieving better prediction. For one day prediction, there is no such information, thus making D2CL perform worse than ConvLSTM.
H. Comparison With the Physical Model
We conducted more experiments to compare the performance of our model with the physical approach MyOcean using OISST as the ground truth. The results of MyOcean are downloaded directly from the Copernicus Marine Service2 and covers the period from January 2015 to December 2019. We also consider ECS and use the spatial resolution of
According to Table VII, MyOcean achieves quite good performance in terms of MAE and works better than CNN and ConvLSTM. Our model D2CL, however, reaches a higher prediction accuracy than other three methods in terms of all four evaluation metrics.
I. Correlation Analysis
Here, we apply the historical SST data of 30 days
\begin{align*}
\text{pccs}_{i,j} = \frac{\text{cov}(X_i,Y_j)}{\rho (X_i)\rho (X_j)} \tag{11}
\end{align*}
Figs. 9 and 10 show the heatmaps of the average
Correlation of the historical SST records with the predicted future SST records on the ECS data.
Correlation of the historical SST records and the predicted future SST records on the SCS data.
From Fig. 9, we can see that the odd future days, i.e., the 1st, 3rd, 5th, and 7th future days, have almost similar pattern of correlation with the historical days, and the even future days, i.e., the 2nd, 4th, and 6th future days, have nearly similar pattern of correlation with the historical days. Specifically, the odd future days have strong correlation with the 2nd to 5th, 11th to 13th, 20th to 23th, and 28th to 30th historical days. The even future days have strong correlation with the 5th to 9th, 14th to 19th, and 23th to 27th historical days. There is no clear reason why the SST changes are periodic on the ECS data because the dynamics of SST is very complicated. According to our knowledge, such periodicity may be due to the regular sea surface wind or the ocean current.
However, the periodic correlation patterns on the SCS data, as shown in Fig. 10, are not so obvious as on the ECS data. That is, by comparing Figs. 9 and 10, it is obvious that the correlation between the future days and historical days on the ECS data is stronger than that on the SCS data. On the one hand, we speculate that the SST in SCS is influenced by both of the Indian Ocean Warm Pool and the Pacific Ocean, which leads to weak periodic patterns. On the other hand, the dynamics of SST in SCS is smaller than that in ECS, and the predictions on the SCS data are, thus, more accurate than that on the ECS data.
J. Visualization Analysis
To provide a clear view of the advantage of D2CL, we visualize the absolute errors, i.e., MAE, of CNN, ConvLSTM, and D2CL in seven days on the ECS data [cf., Fig. 11(a)]. For each subplot, the x-axis and y-axis correspond to the longitude and latitude, respectively. Each subplot has
Absolute errors of CNN, ConvLSTM, and D2CL in seven predicted days. (a) ECS. (b) SCS.
We also visualize the prediction errors of CNN, ConvLSTM, and D2CL when predicting SST for the next seven days on the SCS data [cf., Fig. 11(b)]. Both the ConvLSTM model and D2CL model achieve better prediction performance than CNN.
Conclusion
In this work, we proposed a new SST prediction model D2CL, which can learn spatial and temporal features simultaneously via dilated ConvLSTM operation. The D2CL model uses multiple feature extractors of different dilation rates to learn features of multiple scales and introduces dense connection to maximize the information transfer between model layers. According to the experiments on real datasets, our model outperforms the existing methods. In the future, we will try to exploit more external features, e.g., the wind speed and the radiation of the short wave, to further improve the model prediction accuracy.