Loading [MathJax]/jax/output/HTML-CSS/autoload/mtable.js
Accurate Prediction Scheme of Water Quality in Smart Mariculture With Deep Bi-S-SRU Learning Network | IEEE Journals & Magazine | IEEE Xplore

Accurate Prediction Scheme of Water Quality in Smart Mariculture With Deep Bi-S-SRU Learning Network


The steps of the scheme are as follows: (1) A series of improved interpolation, smoothing and wavelet transform filtering techniques are used to repair, correct and denoi...

Abstract:

In the smart mariculture, the timely and accurate predictions of water quality can help farmers take countermeasures before the ecological environment deteriorates seriou...Show More

Abstract:

In the smart mariculture, the timely and accurate predictions of water quality can help farmers take countermeasures before the ecological environment deteriorates seriously. However, the openness of the mariculture environment makes the variation of water quality nonlinear, dynamic and complex. Traditional methods face challenges in prediction accuracy and generalization performance. To address these problems, an accurate water quality prediction scheme is proposed for pH, water temperature and dissolved oxygen. First, we construct a new huge raw data set collected in time series consisting of 23,204 groups of data. Then, the water quality parameters are preprocessed for data cleaning successively through threshold processing, mean proximity method, wavelet filter, and improved smoothing method. Next, the correlation between the water quality to be predicted and other dynamics parameters is revealed by the Pearson correlation coefficient method. Meanwhile, the data for training is weighted by the discovered correlation coefficients. Finally, by adding a backward SRU node to the training sequence, which can be integrated into the future context information, the deep Bi-S-SRU (Bi-directional Stacked Simple Recurrent Unit) learning network is proposed. After training, the prediction model can be obtained. The experimental results demonstrate that our proposed prediction method achieve higher prediction accuracy than the method based on RNN (Recurrent Neural Network) or LSTM (Long Short-Term Memory) with similar or less time computing complexity. In our experiments, the proposed method takes 12.5ms to predict data on average, and the prediction accuracy can reach 94.42% in the next 3~8 days.
The steps of the scheme are as follows: (1) A series of improved interpolation, smoothing and wavelet transform filtering techniques are used to repair, correct and denoi...
Published in: IEEE Access ( Volume: 8)
Page(s): 24784 - 24798
Date of Publication: 03 February 2020
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

In the mariculture, water quality is one of the important factors that affect fish production. However, water quality is subject to change, because it is affected by many factors, such as fish density, feed, climate, and more. The drastic change of water quality can disrupt the balances of algae and bacteria phases. The unbalance of ecological environment can lead to serious consequences, such as the physiological stress, the disease, and even the massive death of fish. An accurate and real-time prediction of water quality parameters can help farmers take measures to adjust water quality in advance if necessary in order to ensure a suitable breeding environment. These measures can improve the efficiency of fish production. Furthermore, through the accurate prediction and the timely adjustment of water quality, the use of drugs can be reduced, which is of great significance for green and precision agriculture.

A. Related Work and Motivation

The collected water quality data usually needs to be preprocessed for data cleaning. Gao et al. proposed to repair the data by using linear interpolation and mean value smoothing [1]. The system clustering method and principal component analysis method have been used for feature selection to achieve dimensionality reduction of the input data in the prediction model. Finally, they employed wavelet denoising technology to deal with the key influencing factors. Zhang et al. proposed a missing data filling method based on convolutional neural network, which has been used to fill data with temporal correlation between time-series and spatiotemporal correlation between sensor nodes [2]. Yang et al. proposed a data preprocessing method based on feature extraction and clustering. The Lasso algorithm and K-Means algorithm were used to extract and cluster the temperature data respectively, which have greatly improved the prediction accuracy of temperature [3]. Xia et al. proposed the optimal mixed imputation (OMI) algorithm for missing data filling [4]. Maria et al. proposed a preprocessing method for decomposing meteorological data using wavelet decomposition and principal component analysis [5]. Though the aforementioned methods can improve the precision of data preprocessing, their structures are complex and difficult to be implemented. Meanwhile, traditional linear interpolation methods have breakpoint phenomena in actual interpolation, while the mean smoothing method can only be used for the dataset with less deviation.

Next, Pearson correlation coefficient has been applied to analyze the correlation between the predicted water quality parameters and other water quality parameters. Advanced integration method as spatial cross-correlations [6] can remedy some shortcomings of Pearson correlation coefficient method such as inaccuracy and fluctuation of correlation analysis results when objects to be processed are insufficient. Considering the abundant experimental objects in our paper, the relatively simple Pearson correlation coefficient method is going to be imported in our experiments.

For water quality prediction, the major approaches include time series method [7]–​[9], Markov method [10], grey system theory method [11] and support vector regression machine method [12], [13]. However, these methods have some drawbacks, such as weak generalization ability, low computational efficiency and unstable prediction accuracy. Hence, they cannot meet the ever-increasing requirements in precision agriculture. In recent years, the prediction methods based on ANN (Artificial Neural Network) and deep learning have been proposed [14], [15]. They have the advantages of good robustness, high fault tolerance and sufficient fitting of complex nonlinear relations. Liu et al. used BP neural network to predict multi-scale water temperature based on empirical mode [16]. Han et al. established a water quality prediction model in wastewater treatment based on an improved radial basis function neural network with flexible structure [17]. Miao et al. used Levenberg Marquardt (LM) neural network and genetic algorithm to build a dissolved oxygen prediction model [18]. What’s more, prominent water quality prediction models based on LSTM have also been constructed [11], [19], [20].

B. Main Contributions of the Paper

In this paper, we design a procedure to fullfill the prediction of the key water quality parameters. To improve the data cleaning in the preprocessing stage, the fixed threshold method is used to discard the abnormal individual data, and the mean proximity mehthod is used to complete the collected data. Then, the wavelet analysis and improved smoothing method are used for noise reduction and error correction respectively. Next, the Pearson correlation coefficient method is employed to discover the correlation between the key water quality parameters. In the prediction phase, combined with the results after preprocessing and the obtained correlation prior, the prediction model based on our proposed Bi-S-SRU deep learning network is used to predict the key water quality parameters.

The Bi-S-SRU model is proposed to improve the RNN [21], LSTM [22] and SRU [23], [24] network structures. It has the advantages of simple structure, fast convergence, and good stability. Our proposed Bi-S-SRU model is mainly composed of two stages. The first stage is the preprocessing of collected water quality data. The second stage is the construction of the Bi-S-SRU-based water quality prediction model. In addition, we discuss the prediction results of different water quality parameters in same environment setting, and compare the Bi-S-SRU-based method with three other aforementioned methods.

Our main contributions can be summarized as follows:

  • In the data preprocessing, the proposed mean proximity method and improved smoothing method can accurately complete and correct the water quality data to be repaired, which solves the breakpoint phenomenon and increases the accuracy of data cleaning.

  • The Bi-S-SRU deep learning network is proposed, which can integrate the future context information into the prediction of the current time point data. Meanwhile, according to the existing dynamic model, the degree of correlation between important parameters is analyzed. According to the results of correlation analysis, the training data of the learning model are multiplied by the corresponding weight coefficient.

  • An overall scheme for accurately predicting water quality parameters is proposed. This scheme uses the pre-processed data and correlation priors to train a Bi-S-SRU model to obtain a prediction model. The prediction model is then used to predict key water quality parameters in aquaculture.

  • We build and expose a large raw data set collected in time series, which contains water quality and climate environment data at 23,204 time nodes.

C. Paper Organization

The rest of this paper is arranged as follows. Section II gives the acquisition method of data and the outline of the proposed scheme. Section III introduces the preprocessing process of water quality. Section IV presents the network structure of Bi-S-SRU and the construction of prediction model. In Section V, we analyze and discuss the experimental results. Section VI summarizes our work and illustrates future works.

SECTION II.

Materials and Overview of Methodology

A. Acquisition of Data

In our investigation, we conduct our study by using real data collected in the marine aquaculture base in Xincun Town, LingShui County, Hainan Province, China. Fig. 1 illustrates the water quality data acquisition module, transmission module, cloud server module, and terminal display module in the IoT system. The IoT hardware system mainly includes one multi-sensor node, one wind power generation device, one set of solar power panels, one 4G industrial routing module, one wind-solar complementary controller, one local storage module and one wireless transmission module. From Fig. 1, the IoT system realizes the data acquisition, transmission, cloud storage of the data, business logic development, intelligent prediction analysis and calculation, and related application services.

FIGURE 1. - The topology structure diagram of the smart mariculture IoT system.
FIGURE 1.

The topology structure diagram of the smart mariculture IoT system.

Due to possible power failures, sensor device aging, artificial oxygenation, bait feeding, fish net switching and voltage instability, the received data may be lost, defective and erroneous. Meanwhile, the water quality data monitored by the sensors needs to be transmitted over a long distance via a 4G wireless module. During this process, the transmitted data is susceptible to interference from near-ground noise, transceiver noise, etc. Therefore, before the construction of deep learning network, the collected water quality parameters have to be filled, recovered and filtered [25].

B. Overview of Scheme

The overall prediction scheme is shown in Fig. 2, where our main innovations are marked with shaded colors. The specific steps of the proposed scheme are as follows: (1) After receiving water quality data from the wireless transmission network, a series of improved interpolation, smoothing and wavelet transform filtering techniques are used to repair, correct and denoise the water quality data respectively; (2) Pearson’s correlation coefficient is used to obtain correlation priors of water quality parameters and climate parameters; (3) The water quality prediction model based on Bi-S-SRU is constructed using the preprocessed data and its correlation information. When the prediction accuracy of the model reaches the expected requirements, the overall prediction model is considered to be established successfully. Otherwise, it will be retrained to obtain better results.

FIGURE 2. - The prediction scheme of the water quality parameters.
FIGURE 2.

The prediction scheme of the water quality parameters.

SECTION III.

Preprocessing of Water Quality Data

In this section, we take the preprocessing of water temperature data as example.

A. Threshold Processing and Data Completion Method Selection

The data stored at the collection site and obtained after transmission are termed as backup data and received data respectively. Firstly, the abnormal individual data can be removed by setting the upper and lower threshold ranges. Thus, the data convergence can be preliminarily improved and the completion accuracy of the mean proximity method can be enhanced as well. Setting up a threshold can recover data to some extent. However, it may lead to over-aggregation of data. In actual application process, the threshold range should be appropriately enlarged according to the actual situation. The incereasing deviation produced by such enlarging process can be compensated by using the subsequent improved smoothing method to achieve higher recovery accuracy.

Next, we examined the performance of several existing data interpolation methods for obtaining better data completion accuracy. Fig. 3 shows the data restoration effects of Linear, Spline, and Cubic interpolation for the received data. In our experiment, the missing data is supplemented with the value 0 to refrain from breakpoints.

FIGURE 3. - Experimental effects of Linear, Spline and Cubic interpolation.
FIGURE 3.

Experimental effects of Linear, Spline and Cubic interpolation.

It can be seen from Fig. 3 that all three interpolation methods have acceptable completion effects. However, the data processed using linear interpolation is slightly closer to the backup data. We accumulated the absolute values of the difference between recovered data and backup data, and use the accumulated result as an evaluation index. As shown in Table 1, the experimental results of Linear interpolation are better than Spline interpolation and Cubic interpolation.The main reason is that the time interval of data acquisition is short, and the water quality parameters change slowly and stably in time series.

TABLE 1 Performance Comparison Between Three Interpolation Methods
Table 1- 
Performance Comparison Between Three Interpolation Methods

However, the traditional linear interpolation methods still have the phenomenon of data discontinuity. Fig. 4 shows a detail view of some of the data breakpoints. When training the prediction model, the data set with breakpoints in the time series may engender inaccurate input training samples.

FIGURE 4. - Filling effect of data with linear interpolation.
FIGURE 4.

Filling effect of data with linear interpolation.

In order to refrain from breakpoints, we propose a mean proximity method. To maintain stable and continuous variation of water quality parameters in time series, the missing data is replaced with the average value of the nearest effective data by mean proximity method given in (1).\begin{equation*} a_{i}=\frac {a_{i-n}+a_{i+m}}{2}\tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where a_{i} , a_{i-n} , and a_{i+m} denote the i \mathrm {th} data in the data set, the n \mathrm {th} non-empty data before a_{i} , and the m \mathrm {th} non-empty data after a_{i} respectively.

From Fig. 5, it is obvious that all breakpoints’ locations are successfully filled with meaningful values. Fig. 6 shows that the number of error points where the red point deviates from the backup data is approximately equal to or even slightly less than the number of points of cyan. For error points where the two do not coincide, the number of cyan error points is slightly more than that of red. Therefore, the mean proximity method performs better with Linear interpolation in data completion.

FIGURE 5. - The completion effect using the improved method.
FIGURE 5.

The completion effect using the improved method.

FIGURE 6. - Comparison of the completion effect represented by scatter plot.
FIGURE 6.

Comparison of the completion effect represented by scatter plot.

B. Data Filtering Method Selection

There are noise interferences in collected dataset during long-distance wireless transmission. Filtering is the operation of filtering the frequency of specific interference band in the signal. Three filtering methods including moving average filter method, median filter method and wavelet transform method have been tested and compared separately in our study. The wavelet transform method is adopted after our investigation as it has the characteristics of low entropy, multi-resolution and de-correlation.

1) Moving Average Filter Method

Based on the statistical law, the mobile average filter treats the continuous sampled data as a queue with a fixed length as N . When the queue enters a new data, one data exits at the same time. Then, the queue is arithmetically averaged to yield the result of this measurement.

2) Median Filter Method

Median filtering method is a kind of nonlinear signal processing technology that can effectively suppress noise based on sorting statistics theory. Its basic principle is to replace the median value of a point in a digital image or digital sequence with the median value of each point in the neighborhood of current point, and thus eliminate the isolated noise points.

3) Wavelet Transform Method

Wavelet transform is a new transform analysis method that can analyze the location of time (space) frequency [26]. Through the telescopic translation operation, the signal (function) is gradually multi-scale refined to achieve time subdivision at high frequency and frequency subdivision at low frequency, which can automatically adapt to the requirements in time-frequency signal analysis.

Wavelet transform method is mainly divided into three steps to filter. In the first step, the noise signals are wavelet transformed. In the second step, the threshold value of the wavelet coefficients obtained by the transformation is quantized to remove the noise contained in the signal. In the final step, inverse wavelet transform is carried out on the processed wavelet coefficients to reconstruct the wavelet and obtain the filtered signal.

4) Comparison of Data Filtering Methods

The noise reduction effects of three filtering methods, defined SNR, BIAS and RMS [27] are compared. In our experiment, the fixed window value of the moving average method and the median filtering method is set to 4 and 10 respectively. Then the threshold value of wavelet analysis method is set as \sqrt {2~log\left ({{length\left ({X}\right)}}\right)} , where X denotes the input data set.

As shown in Table 2 that the signal-to-noise ratio achieved by the wavelet transform method is 1.5 and 1.2 times of that achieved by the moving average method and the median filtering method. In addition, the metric of RMS for wavelet transform method is smaller than that of the moving average method and the median filtering method. Although the effect of wavelet transform method is slightly worse than that of median filter method in the evaluation process of BIAS, the idea of median has been introduced in data error correction, which can improve the reliability of data. Thus, wavelet transform method is the most suitable one to filter noise among three methods.

TABLE 2 Performance of Different Norms for Three Filtering Methods
Table 2- 
Performance of Different Norms for Three Filtering Methods

C. Error Correction Method Selection

Since there are still some deviations in the repaired data after applying the wavelet transform method, further error correction is significant. Hence, error correction needs to be performed and divided into two stages as error detection and error correction. In the process of error detection, the average of the relative deviation value among the data set in the backup data is calculated as the comparison index, which is used for testing the data set in the received data. When the relative deviation value of one data exceeds m times of itself in backup data, this data is identified as a wrong data, which needs to be corrected. The median of k data before and after the error data is used as the error correction value for modification. In (2), as shown at the bottom of the this page,

median\left ({{\left [{ {{a_{i - k}}:{a_{i + k}}} }\right]} }\right) is the median of k data before and after, \overline {\alpha } is the average of the relative deviation of the two adjacent data in backup data relative to the latter data, \overline {\beta } is the average relative deviation of the two adjacent data in backup data relative to the former data, and m is used to adjust the degree of error correction. The values of \overline {\alpha } , \overline {\beta } , and m can be preseted basing on historical data.

Our data acquisition frequency is 5 minutes each time. In order to better reflect the actual situation with guaranteed accuracy, we take the value of k as 10. For the selection of m values, we introduce two evaluation metrics, defined BIAS and RMS. The error correction experiments are performed with the data processed in Section II.B, the deviations between the data before or after error correction and the backup data have been evaluated.

It can be seen from Fig. 7 that the corrected data is much closer to the backup data when evaluated by BIAS and RMS. When the value of m is in the range of 0 and 0.8, the BIAS and RMS are almost unchanged. To avoid processing the normal data, m is set to 0.8. It is calculated that when m is 0.8, the value of BIAS after error correction is 0.0268, while the value without error correction is 0.0347. The value of RMS after error correction is 0.038, while the value without error correction is 0.0545. Therefore, the improved smoothing method has a pretty result on the error correction of the filtered data.

FIGURE 7. - Comparison of two evaluation standards when m differs between the data before or after error correction and the received data.
FIGURE 7.

Comparison of two evaluation standards when m differs between the data before or after error correction and the received data.

To verify the correctness of the improved smoothing method, we compared the effect of the improved smoothing method with the traditional smoothing method basing on the filtered data.

By applying the mean smoothing method, the value of BIAS is 0.034 and the value of RMS is 0.053. But the value of BIAS is 0.0268 and the value of RMS is 0.038 by conducting the improved smoothing method, which means that the precision of the improved smoothing method is higher than the traditional smoothing method. From Fig. 8, it is obvious that the data processed by the improved smoothing method is closer to the backup data.

FIGURE 8. - Comparison of data before and after correction.
FIGURE 8.

Comparison of data before and after correction.

D. Correlation Analysis

Before establishing the model for key water quality parameters, correlation analysis is necessary for determining the correlation degree among water quality parameters. In this case, Pearson correlation coefficient method [28] is introduced to measure the correlation degree between the two variables. According to Pearson method, the correlation coefficient among these parameters and other water quality parameters can be obtained, as shown in Table 3. The closer the value is to 1 or −1, the higher the degree of correlation is.

TABLE 3 Correlation Matrix
Table 3- 
Correlation Matrix

In Table 3, Temp, DO, Precipitation, and Air Temp stands for water temperature, dissolved oxygen, precipitation of rainfall, and air temperature, respectively. It can be concluded from Table 3 that the water temperature has a moderate negative correlation with the salinity and the dissolved oxygen, a moderate positive correlation with the air temperature, a moderate negative correlation with the pH and precipitation. On the other hand, pH has a strong positive correlation with dissolved oxygen, a moderate positive correlation with water temperature and air temperature, a moderate negative correlation with salinity, and almost no correlation with precipitation. In addition, dissolved oxygen has a strong positive correlation with pH, a moderate negative correlation with water temperature, and almost no correlation with salinity and air temperature.

When a parameter is being predicted, if all other relevant parameters are directly fed into the model, it is likely to affect the overall averageness. Therefore, when relevant parameters are inputed, the model will conduct weighted input process on them with reference to correlation coefficient.

SECTION IV.

Proposed Bi-S-SRU Based Prediction Model

A. Principle of SRU Deep Learning Model

In recent years, deep learning methods [29]–​[31] derived from neural networks have been applied in many fields. RNN is a kind of ANNs, and is evolved from Hopfield network [21] for modeling serialized data.

In the forward propagation of RNN, the data is transmitted along the time series from the input layer to the hidden layer and then to the output layer. In Fig. 9, t, x, s, and y denote different moments, the input set, and the hidden unit input set, the output set of the RNN, respectively. U is the weight matrix of the input set, W is the weight matrix of the input of the hidden layer input at the previous moment, and V is the weight matrix between the input data of the hidden layer output data and the output layer.

FIGURE 9. - RNN model.
FIGURE 9.

RNN model.

When the forward propagation completes, the RNN performs back-propagation through time (BPTT) to reverse the deviation between the predicted value and the true value in each round of forward propagation to adjust and update various parameters (such as weights, offsets) involved in the forward propagation process. Neural networks require constant adaptive training to adjust the overall parameters of the network to keep the predicted values close to the true values. Back propagation is a necessary process for training high-quality neural network models.

However, due to the limitations of RNN structure and algorithm, its capacity has certain range of restrictions. In the absence of an effective information screening mechanism, important information in the early stage may be discarded in the post-training process. Therefore, there is a bottleneck in the improvement of training accuracy. SRU is a sort of improved neural network based on RNN [32]. The main difference between SRU and RNN is the “cell state” part added in hidden layer. This part exists for judging and filtering the effective information in the training process. The structure of SRU is shown in Fig. 10.

FIGURE 10. - The structure of SRU.
FIGURE 10.

The structure of SRU.

Similar to other neural networks based on RNN improvements (such as LSTM, GRU), the SRU determines the throughput of the cell state at different time by adjusting the “gate” structure. The difference is that the SRU avoids the situation that the current time step depends on the output of the previous time step {s_{t - 1}} . Hence, parallel calculation can be performed to ensure the control of the information and improve the overall speed with little loss of precision. In addition, through the design of the gate, the update gradient vanishing and the gradient explosion problem in the RNN can be greatly alleviated [24], and the deviation between the predicted value and the real value during the training process is reduced. Fig. 11 shows the gate control structure between the states of the SRU unit. The current input x_{t} enters the forget gate, input gate, and reset gate via three paths.

FIGURE 11. - The detail construction between SRU cells.
FIGURE 11.

The detail construction between SRU cells.

The tools specifically used to select the training information in the gate control structure are the activation function sigmoid with value field [0, 1] and tanh with value field [−1, 1]. The regulation is accomplished by multiplying the output vector of the gate by the element of the data vector to be controlled. In extreme cases, all information is retained when the gate output is 1, and all information will be discarded when the gate output is 0.

B. Principle of Bi-S-SRU Deep Learning Model

Typical RNN and its derivative networks, such as LSTM and SRU, tend to ignore future information when processing sequence in time series. One obvious solution is to add a delay between input and output, which can leave some time for the network to add future context information. In other words, the future information of the M frame is added to predict the output together. In theory, the value of M can be very large to capture all available information in the future, but in fact it is found that if M is too large, the prediction results become worse. This is due to the fact that the network concentrates on memorizing a large amount of input information, which leads to the decline of the ability to combine predictive knowledge with different input vectors. Therefore, the size of M needs to be adjusted manually. To address the above problems, we propose an improved model called Bi-S-SRU (Bi-directional stacked simple recurrent unit).

Other bi-directional stacked neural networks [33], [34] have shown excellent experimental results. The basic idea of Bi-S-SRU is to superimpose a forward and a backward SRU into each training sequence, and the two SRUs are connected to an output layer. This structure provides both past and future context information for each point in the input sequence of the output layer. Fig. 12 shows a Bi-S-SRU network unfolding along time. The six weight sets in the figure respectively represent: the weight of input layer to forward and backward hidden layer (W1, W3), the weight transferred between hidden layer (W2, W5), and the weight of forward and backward hidden layer to output layer (W4, W6). In addition, there is no information flow between the forward and backward hidden layers, which ensures that the expansion diagram is non-cyclic.

FIGURE 12. - The structure of Bi-S-SRU.
FIGURE 12.

The structure of Bi-S-SRU.

The specific algorithm flow of the Bi-S-SRU forward propagation process in both directions is shown in Table 4. The forward process in two directions in the model shares the same batch of data, which is summarized in the output layer after passing through the same network structure in different directions.

TABLE 4 Description of Bi-S-SRU Forward Propagation Process in Both Directions
Table 4- 
Description of Bi-S-SRU Forward Propagation Process in Both Directions

The calculation process of Bi-S-SRU is divided into two steps: forward pass and backward pass. In the forward pass, the forward calculation process of the hidden layer of Bi-S-SRU is the same as that of one-way SRU, except that the input sequence is in the opposite direction for the two hidden layers. The output layer is not updated until the two hidden layers have processed all the input sequences. The description of the forward pass is shown in Table 5.

TABLE 5 Description of Forward Pass in Bi-S-SRU
Table 5- 
Description of Forward Pass in Bi-S-SRU

In the second step, the backward pass of Bi-S-SRU is similar to RNN backpropagation process. In the beginning, several items are stored in the output layer for calculation in each time step, then the stored items return to two hidden layers in different directions. All processes of backward pass are shown in Table 6.

TABLE 6 Description of Backward Pass in Bi-S-SRU
Table 6- 
Description of Backward Pass in Bi-S-SRU

In the above training model, we introduce three evaluation metrics [35] to evaluate the prediction effect, which are defined as follows:

Definition 1 [MAE (Mean Absolute Error)]:

MAE is the basic evaluation metric, and the following methods are generally used as a reference to compare the advantages and disadvantages.\begin{equation*} \!MAE=\frac {1}{N} \sum _{i=1}^{N}\left |{y_{i}-\overline {y}_{i}}\right |\tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Definition 2 [RMSE (Root Mean Squared Error)]:

RMSE denotes the mean error, which is more sensitive to extreme values. If there is an extreme value in the training process at some time points, RMSE will be greatly affected by the increasing error. The change of the evaluation index can be used as the benchmark for the robustness test of the model.\begin{equation*} RMSE = \sqrt {\frac {1}{N}\sum \limits _{i = 1}^{N} {{{\left ({{\left |{ {y_{i} - {\bar y_{i}}} }\right |} }\right)}^{2}}} }\tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Definition 3 [MAPE (Mean Absolute Percent Error)]:

MAPE considers not only the deviation between the predicted data and the real data, but also the ratio between the deviation and the real data.\begin{equation*} MAPE = \frac {1}{N}\sum \limits _{i = 1}^{N} {\frac {{\left |{ {y_{i} - {\bar y_{i}}} }\right |}}{y_{i}}}\tag{5}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

In (3), (4) and (5), y_{i} denotes the real value, {\bar y}_{i} denotes the value predicted by the model, i.e., the output value of the deep learning model; N is the number of samples in the data set. The closer the above three evaluation metrics are to 0, the better the prediction and fitting effect of the model will be.

C. Construction of Bi-S-SRU Prediction Model and Metrics Analysis

As described in section III.A and III.B, we establish the Bi-S-SRU water quality prediction model with the pre-processed data of pH, water temperature and dissolved oxygen in Section II. Water temperature, air temperature, salinity, precipitation, pH, and dissolved oxygen are contained in raw data, and the data collection interval is 5 minutes. A total of 20,000 groups are used for model training, and another 3,000 groups are used for comparison of prediction results. The experimental environment is: Inter(R) Core (TM) i7-7800X processor, NVIDIA TITAN RTX GPU, 32 GB RAM, Windows 10 (64-bit) operating system, Anaconda3 IDE, and the construction of neural network model is based on python 3.6 and Tensorflow 1.6.0 package.

According to the coefficient correlation analysis in Section II.D, there is a strong or moderate correlation for pH with dissolved oxygen, air temperature, salinity and water temperature, a strong or moderate correlation for water temperature with salinity, air temperature, pH, precipitation and dissolved oxygen, and a strong or moderate correlation for dissolved oxygen with pH and water temperature. The historical data of these parameters and the parameter to be predicted are used as the weighted input data of the prediction model according to correlation coefficient construction for training, and the output of the model is the predicted data.

There are 20,000 sets of preprocessed data imported into three parameter models, and the data in each model is separately trained by RNN, LSTM, SRU and Bi-S-SRU. In the pH, water temperature and dissolved oxygen training model, the input unit dimensions are set to 5, 6 and 3 respectively, the output layer dimensions are all 1, the learning rate is set to 0.005, and the time step is set as 20, both of which are trained 1,000 times with cell=5. In each training process, the evaluation metrics including MAE, RMSE and MAPE between the output values of all output layers and the real values are recorded in Table 7, 8, and 9.

TABLE 7 Records of MAE/RMSE/MAPE When Training Process Reach Specific Times in Water Temperature Training Model
Table 7- 
Records of MAE/RMSE/MAPE When Training Process Reach Specific Times in Water Temperature Training Model
TABLE 8 Records of MAE/RMSE/MAPE When Training Process Reach Specific Times in pH Training Model
Table 8- 
Records of MAE/RMSE/MAPE When Training Process Reach Specific Times in pH Training Model
TABLE 9 Records of MAE/RMSE/MAPE When Training Process Reach Specific Times in Dissolved Oxygen Training Model
Table 9- 
Records of MAE/RMSE/MAPE When Training Process Reach Specific Times in Dissolved Oxygen Training Model

From Fig. 13~15 and Table 7~9, the model is greatly affected by the initial random weight value in the initial stage of training, and the MAE/RMSE/MAPE of the three prediction models tends to be stable in the latter stage. Three metrics of the Bi-S-SRU in the whole process are always much smaller than those of other models. In addition, it can be observed that the RNN has a lower RMSE when training 500 times than when training 1,000 times in water temperature model, which is called over-fitting. It is due to slight insufficiency of learning rate and high complexity of RNN, and RMSE (or other metrics) may fall into local minimum temporarily in later training period, which can be solved by adjusting the learning rate, simplifying the network construction, and performing adequate data-preprocessing.

FIGURE 13. - Comparisons of RMSE and MAPE in water temperature prediction model training.
FIGURE 13.

Comparisons of RMSE and MAPE in water temperature prediction model training.

FIGURE 14. - Comparisons of RMSE and MAPE in pH prediction model training.
FIGURE 14.

Comparisons of RMSE and MAPE in pH prediction model training.

FIGURE 15. - Comparisons of RMSE and MAPE in dissolved oxygen prediction model training.
FIGURE 15.

Comparisons of RMSE and MAPE in dissolved oxygen prediction model training.

SECTION V.

Experimental Results and Discussions

A. Comparison of Prediction Effects for Different Parameters

We retrieve the models trained by the different neural networks in four parameter prediction models, and predict the future pH, water temperature and dissolved oxygen data respectively. The results are shown in Fig. 16.

FIGURE 16. - The prediction effect of pH, water temperature and dissolved oxygen.
FIGURE 16.

The prediction effect of pH, water temperature and dissolved oxygen.

Low-pass filters are used to remove abnormal prediction results or points with high regional frequencies in above figures. According to the actual situation, four models predict 1,000 pH and dissolved oxygen data and 2,500 water temperature data with five minutes frequency.

The predicted data in all four models are close to the true data (pH deviation is less than 0.05 on average, water temperature deviation is less than 0.8 degree Celsius on average, and dissolved oxygen deviation is less than 0.3mg/l on average). It is obvious that the Bi-S-SRU has a higher degree of fitting with the real value and the outputs of the Bi-S-SRU training model have much smaller MAE/RMSE/MAPE. Hence, the water quality parameter prediction with the proposed Bi-S-SRU model achieves better prediction results.

The accuracy and range ability of the sensors are shown in Table 10, where F.S., NTU, PSU denote Full Scale, Nephelometric Turbidity Unit, and Practical Salinity Unit, respectively.

TABLE 10 The Accuracy and Range Ability of the Sensors
Table 10- 
The Accuracy and Range Ability of the Sensors

B. Comparison of Training Time for RNN, LSTM, SRU and Bi-S-SRU

As shown in Fig. 17, the SRU model spends more time than RNN but less than LSTM, which is a result of adding the new unit states to RNN for controlling information in the network into SRU.

FIGURE 17. - Comparison of training time cost.
FIGURE 17.

Comparison of training time cost.

In the pH prediction model, RNN (cell=5) training 1,000 times and its duration is 5,860.128s, while SRU (cell=5) spends 6,136.598s, LSTM (cell=5) spends 6,754.23s and Bi-S-SRU (layer=2, cell=5) spends 6,535.63s. In conclusion, Bi-S-SRU consumes 10.33% more time than RNN and 6.11% more time than SRU. In the water temperature prediction model, the training time cost of RNN is 4,675.35s, while LSTM spends 6,979.47s, SRU spends 5,388.66s and Bi-S-SRU spends 5,942.75s. Bi-S-SRU takes 21.32% more time than RNN and 9.32% more time than SRU. In the dissolved oxygen prediction model, Training time of RNN is 4,763.81s, while LSTM spends 6,085.53s, SRU spends 5,878.16s and Bi-S-SRU spends 6,199.34s. Bi-S-SRU takes 26.01% more time than RNN and 5.21% more time than SRU.

It can be concluded that all models above take more time to train when there are more relevant parameters (such as water temperature prediction). Under the same conditions, SRU, LSTM and Bi-S-SRU training is more time-expensive than RNN. The prediction performance of RNN and Bi-S-SRU is shown in Fig. 16. The prediction accuracy of Bi-S-SRU is up to 94.42%, and the RMSE of Bi-S-SRU is only 52.1% to 81.1% of RNN. Compared with RNN, the Bi-S-SRU-based method also shows better fitting effect when comparing actual data for prediction. The Bi-S-SRU takes 8.99% to 23.51% more training time than RNN on average. In general, Bi-S-SRU has more practical significance when constructing a key water quality prediction model in the marine aquaculture.

SECTION VI.

Conclusion and Future Work

In this paper, we proposed the process and model for the accurate prediction of key water quality parameters (pH, water temperature and dissolved oxygen) in smart mariculture. Firstly, the collected water quality data is repaired and corrected by the improved preprocessing method, and then the data is filtered and denoised by wavelet transform method. After preprocessing, the data received by remote transmission can be recovered well. Next, we construct the Bi-S-SRU (Bi-directional Stacked SRU) deep learning prediction model by importing pretreated dataset weighted with the discovered correlation coefficients. The experimental results demonstrate that our proposed prediction model can achieve higher prediction accuracy and stability compared with RNN-based and SRU-based prediction models. In the actual prediction, the average prediction time taken was 12.5ms, and the prediction accuracy can reach 94.42% in the next 3~8 days. Therefore, in smart aquaculture, our proposed scheme can meet the requirements of accurate prediction for water quality parameters.

In the construction of prediction model, the deep Bi-S-SRU network used in the experiment is superior to most other neural networks in terms of prediction accuracy. The experimental results also show that the Bi-S-SRU-based prediction method is only slightly higher in time complexity than the traditional RNN-based or LSTM-based prediction method.

In order to make the water quality prediction model more robust and practical, our future work is mainly to optimize the existing deep neural network structure and combine more relevant prior knowledge for combined prediction to achieve higher prediction accuracy and further lower time cost.

Author Contributions

Juntao Liu: Methodology, Software, Writing - Part of Original Draft. Chuang Yu: Methodology, Software, Writing - Part of Original Draft. Zhuhua Hu: Conceptualization, Supervision, Writing- Reviewing and Editing. Yaochi Zhao: Investigation, Funding Acquisition. Yong Bai: Writing - Review and Editing. Mingshan Xie: Validation. Jian Luo: Data Curation.

Appendix

Supplementary data associated with this article can be found, in the online version, at https://github.com/huzhuhua/Dataset-of-Prediction-of-Water-Quality.

Conflict of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.

References

References is not available for this document.