Introduction
As an emerging industry, photovoltaic power has great economic potential and the ability to transform energy. Accurate prediction of photovoltaic power capacity and market demand can help attract investment, promote industrial upgrading and technological innovation, optimize power grid dispatch, improve the adaptability of the power grid, and cope with the challenges brought by the volatility of photovoltaic power [1].
Existing photovoltaic power prediction methods can be divided into two types: physical model-based methods and statistical learning-based methods. The former describes the working principle of photovoltaic power by establishing a complex mathematical model of photovoltaic power, to achieve accurate prediction of photovoltaic power [2]. The latter involves collecting historical data, using machine learning and other methods to establish a prediction model, and then using this model to forecast future photovoltaic power generation [3]. Common physical model-based methods include single-factor models, electrical models, thermoelectric models, and physical simulation models. Common statistical learning-based methods include regression analysis-based models, neural network-based models, support vector machine-based models, Bayesian network-based models, etc.
Among statistical learning-based methods, deep learning methods exhibit strong adaptive capabilities, enhancing model accuracy by increasing the number of network layers and optimizing training. In recent years, the global focus has shifted towards the widespread adoption of renewable energy sources such as wind power and photovoltaic systems. Reference [4], a long short-term memory network (LSTM) was utilized with improvements in spatial correlation. However, due to its specific architecture, the LSTM model tends to lose significant time-series information when processing long sequences. Additionally, its multi-step prediction process relies heavily on previous predictions, leading to error accumulation and suboptimal forecasting results; reference [5] introduced a method based on extreme learning machines (ELM) for prediction. However, this ELM model demands high-quality input data and meticulous feature selection; reference [6], the authors applied the Variational Mode Decomposition (VMD) method and spatio-temporal convolution to predict photovoltaic power series, addressing issues related to series volatility. Nevertheless, the convolution operation's localized perception limits its ability to effectively capture long-term dependencies, resulting in reduced predictive performance; reference [7] adopted the AdaBoost ensemble model based on the Markov model, achieving better prediction results through a voting weight mechanism. However, it requires significant computational resources due to numerous iterations and prolonged training times; reference [8] proposed the Informer model based on the Transformer model, incorporating an enhanced Multi-head Self-attention mechanism and a unique encoder-decoder structure. This design preserves the model's compact spatial complexity while retaining crucial feature information. Numerous scholars have confirmed that, compared to stepwise prediction models like GRU and LSTM, the Informer model not only delivers precise predictions but also completes calculations rapidly, meeting efficiency requirements for practical applications. However, effective solutions are still lacking to address sequence volatility and information leakage in the stacking process.
In summary, this paper proposes a photovoltaic power prediction model based on VMD-Informer-DCC. Firstly, Spearman's feature selector was used to screen the sequence features; secondly, Variational Mode Decomposition of the input sequence is performed in the encoder of Informer; then, the sequence after VMD is combined to undergo attention stacking through a Multi-head ProbSparse Self-attention mechanism and dilated causal convolutional attention. The information is then passed into the decoder, ensuring the retention of memory information related to the key power output points of the long-term sequence. This process also expands the receptive field and guarantees the causality of the sequence; finally, the generative structure was used in the decoder to greatly shorten the prediction decoding time while retaining the memory information of the power output of the key points in the long time series [9]. The final prediction results of the VMD-Informer-DCC model are compared with other algorithms to verify the effectiveness of the proposed algorithm.
Related Algorithm
2.1 Spearman's Feature Selector
Spearman's rank correlation coefficient emphasizes the importance of nonlinear association when analyzing the relationship between sequence features. Compared with Pearson's rank correlation coefficient, Spearman's rank correlation coefficient can more accurately capture the potential pattern of features and targets without making linear assumptions on the data, providing more reliable data for subsequent model input [10]. The numerical interval of Spearman's rank correlation coefficient is between [−1, 1], the closer to 1, the stronger the positive correlation between the feature and the target, and the closer to −1, the stronger the negative correlation between the feature and the light target. The calculation formula is shown in Formula (1).
\begin{equation*}\rho_{X, Y}=1-\frac{6\sum\nolimits_{i=1}^{n}(x_{i}-y_{i})^{2}}{n^{3}-n}\times 100\% \tag{1}\end{equation*}
In the formula:
2.2 Principle of Variational Mode Decomposition
Empirical Mode Decomposition (EMD) is a decomposition method based on signal locality, but it is prone to numerical instability in the calculation process [11], For example, the modal aliasing phenomenon, therefore, the VMD algorithm comes into being. Variational Mode Decomposition (VMD) is an adaptive and non-recursive method for modal variation and feature extraction [12], for completely irregular sequences, the original data can be decomposed into Intrinsic Mode Functions (IMF) reflecting different randomness and volatility, which avoids the end effect of the sequence in the attention calculation and reduces the complexity of the sequence.
The decomposition process of VMD can be viewed as the construction and solution process of a variational problem [6]. First, a variational problem with specific constraints is constructed. Then, Lagrange multipliers and quadratic penalty factors are introduced to transform this constrained variational problem into an unconstrained variational problem. Finally, the alternating direction method of multipliers was used to complete the variational solution process of the problem.
2.3 Informer Model
In the traditional Transformer model, due to the iteration of a large amount of information, some useless information affects the output of the final sequence [13]. Mr. Zhou proposed the Informer model to solve the problem of inaccurate prediction of long sequence input [8]. Informer assigns higher weights to the dominant features, which greatly reduces the dimension of input time, to extract more critical time series information. When dealing with the complex and nonlinear interdependence between time steps and multiple variables, the generative structure of Informer avoids the error accumulation of multi-step prediction, showing its special advantages. The structure of the Informer is shown in Fig. 1 [8].
The Informer model is mainly composed of N encoders and M decoders [14]. In the Encoder, a Multi-head ProSparse Self-attention mechanism is employed to replace the traditional Self-attention mechanism in the Transformer, reducing data dimensions while retaining a wealth of information. In the Decoder, a generative decoding approach is used to output all prediction results at once in the terminal.
The traditional Self-attention mechanism of a Transformer requires
KL divergence is given in Formula (2) [8].
\begin{equation*}
KL(q \Vert p)=\ln\sum\limits_{l=1}^{L_{K}}e^{\frac{q_{i}k_{j}^{T}}{\sqrt{d}}}-\frac{1}{L_{K}}\sum\limits_{j=1}^{L_{K}}\frac{q_{i}k_{j}^{T}}{\sqrt{d}}-\ln L_{K} \tag{2}\end{equation*}
In the formula:
Informer uses Self-attention distilling to reduce the overhead of space and time, and extracts the more advantageous features of each layer in depth, to halve the information of the lower layer without affecting the loss of important information, the extraction process of each layer is given in Formula (3).
\begin{equation*}
X_{j+1}^{t}=MaxPool\left(ELU\left(Convld\left(\left[X_{j}^{t}\right]_{AB}\right)\right)\right) \tag{3}\end{equation*}
In the formula:
For the decoder part of the Informer, a generative decoding structure is adopted, which reduces the forward propagation layer of the decoder in the traditional Transformer, and then adds a fully connected layer for prediction.
2.4 The Principle of Dilated Causal Convolution
Dilated causal convolution is an operation in convolutional neural networks that is commonly used to process sequential data such as time series, text sequences, etc. It introduces the concepts of dilation and causality in convolution operations to capture long-term dependencies in sequential data. Dilated causal convolution achieves the preservation of this temporal relationship by introducing expansiveness and causality [16].
Dilation means that in a standard convolution, each element of the kernel is multiplied with the corresponding element of the input data and then summed. In dilated convolution, the interval between the elements of the convolution kernel is expanded according to the set value. This expansion enables the convolution kernel to consider further input data, thereby enhancing the receptive field of the convolution operation and affecting the input data range of the output value [17].
Causal refers to maintain the temporal relationship, causal convolution uses causal constraints in the application of convolution kernels, that is, the value of the convolution kernel in the future time step will not affect the output of the current time step [18]. This means that the kernel can only be applied to the input data from the current time step and before, not to the future data.
VMD-Informer-DCC Model
To achieve a good photovoltaic power prediction effect, this paper combines VMD, Informer, and dilated causal convolution for prediction, and designs a photovoltaic power prediction model based on VMD-Informer-DCC, as shown in Fig. 2.
VMD-Informer-DCC architecture, after embdding of the photovoltaic power sequence, the VMD layer (purple part) is added to the encoder to decompose the input sequence into variational modes. The dilated causal convolutional layer (yellow part) replaces the Self-attention distilling and is used to connect the output features of the Self-attention block, which expands the receptive field and ensures the causality of the sequence.
Model input: Consistent with the model input method of Informer, VMD-Informer-DCC divides the original time series data into three key parts for input: sequence features, sequence labels, and time features (e.g., minutes, hours, days, weeks, months, holidays, etc.).
Encoder: The encoder of VMD-Informer-DCC consists of N identical structures, where each layer consists of a VMD layer, a Multi-head ProbSparse Self-attention layer, a dilated causal convolutional layer, and a feed-forward network layer, respectively.
The VMD layer of the encoder makes the photovoltaic sequence more smooth after the Embedding layer. VMD reduces the bandwidth of the photovoltaic power sequence [19], and the unconstrained variational model is shown in Formula (4):
\begin{align*}
&L(\{u_{k}\}, \{\omega_{k}\}, \lambda)\\
=& \alpha\sum\limits_{k}\Vert\partial_{t}[(\delta(t)+\frac{j}{\pi t})\ast u_{k}(t)]\mathrm{e}^{-j\omega_{k}t}\Vert_{2}^{2} \tag{4}\\
&+ \Vert f(t)-\sum\limits_{k}u_{k}(t)\Vert_{2}^{2}+\left\langle\lambda(t), f(t)- \sum\limits_{k}u_{k}(t)\right\rangle\end{align*}
In the formula:
According to Formula (5), the augmented Lagrangian method is used to update the frequency domain of each IMF [19].
\begin{align*}\hat{u}_{k}^{n+1}(\omega)&=\frac{f(\omega)-\sum\nolimits_{i\omega, i\neq k}^{\pi}\hat{u}_{i}^{n+1}(\omega)}{1+2\alpha\left(\omega-\omega_{k}^{n}\right)^{2}}\\
&-\frac{\sum\nolimits_{i\omega, i\neq k}^{\pi}\hat{u}_{i}^{n}(\omega)+\frac{\hat{\lambda}_{i}^{n}(\omega)}{2}}{1+2\alpha\left(\omega-\omega_{k}^{n}\right)^{2}}\tag{5}\end{align*}
In the formula:
Finally, the modal component with central frequency will be obtained by convergence, and the parameters will be updated by Formulas (6), (7), (8) to meet the condition formula, and the final function iteration will be completed calmly.
\begin{align*}
&\omega_{k}^{n+1} =\frac{\int\nolimits_{0}^{\infty}\omega\vert \hat{u}_{k}^{n+1}(\omega)\vert^{2}d\omega}{\int\nolimits_{0}^{\infty}\vert \hat{u}_{k}^{n+1}(\omega)\vert^{2}d\omega}\tag{6}\\
&\hat{\lambda}^{n+1}(\omega)=\hat{\lambda}^{n}(\omega)+\gamma\left[\hat{f}(\omega)-\sum\limits_{k=1}^{n}\hat{u}_{k}^{n+1}(\omega)\right]\tag{7}\\
&\sum\limits_{k=1}^{K}\left(\frac{\Vert\hat{u}_{k}^{n+1}-\hat{u}_{k}^{n}\Vert_{2}^{2}}{\Vert\hat{u}_{k}^{n}\Vert_{2}^{2}}\right) < \varepsilon\tag{8}\end{align*}
In the formula:
The Multi-head ProbSparse Self-attention layer of the encoder is transformed by Formula (2) to obtain the screening evaluation Formula (9) of Q [8].
\begin{equation*}\bar{M}(q_{i},K)=\ln\sum\limits_{j=1}^{L_{K}}e^{\frac{q_{i}k_{j}^{T}}{\sqrt{d}}}-\frac{1}{L_{K}}\sum\limits_{j=1}^{L_{K}}\frac{q_{i}k_{j}^{T}}{\sqrt{d}} \tag{9}\end{equation*}
In the formula:
Self-attention is calculated as Formula (10).
\begin{equation*}
A(Q, K, V)= \text{Soft} \max \left(\frac{\bar{Q}K^{T}}{\sqrt{d}}\right)V \tag{10}\end{equation*}
In the formula:
The dilated causal convolution of the encoder is shown in Formula (11).
\begin{equation*}
X_{j+1}^{t}= \text{MaxPool}\left(\text{ELU}\left(\text{Conv}1d^{\prime}\left(\left[X_{j}^{t}\right]_{AB}\right)\right)\right) \tag{11}\end{equation*}
In the formula:
Figures 3 and 4 show the network of three Self-attention blocks stacked by Self-attention distilling and dilated causal convolution operation, respectively [20]. For ease of viewing, the receptive field of the last input element is shown.
The traditional Informer model is not powerful enough to deal with sequences, and it leads to repetitive and meaningless computations because of the limited receptive field. Moreover, the canonical convolutional layer does not consider the temporal perspective, which will inevitably lead to future information leakage in time series prediction. This solution is inspired by literature [20] and uses dilated causal convolution instead of canonical convolution. For each feature information, when padding is performed, the posterior end is used to ensure the sequence causality, rather than the Informer's front and back ends are filled at the same time, and a longer step is used in the convolution to obtain a larger receptive field.
The feedforward network layer of the encoder consists of two linear transformations with a ReLU activation function in between [13].
Decoder: The decoder of VMD-Informer-DCC consists of M identical structures, where each layer consists of a Multi-head ProbSparse Self-attention layer, a Multi-head ProbSparse Self-attention layer, a dilated causal convolutional layer, a feedforward network layer and, a fully connected layer, respectively.
The Multi-head ProbSparse Self-attention layer, dilated causal convolutional layer, and feed-forward network layer of the decoder performs similarly to the encoder.
The attention layer of the decoder enabled the decoder to focus on the sequence information in the encoder during prediction, to predict the input sequence to the target sequence.
The function of the fully connected layer in the decoder is to map the hidden representation of the decoder to the final target sequence to obtain the prediction result. The fully connected layer is given in Formula (12) [8].
\begin{equation*}
X_{\text{feed}\_\text{de}}= \text{concat}(X_{\text{token}}^{t},X_{0}^{t})\in \mathbb{R}^{(L_{\text{token}}+L_{y})\times d_{\text{model}}} \tag{12}\end{equation*}
In the formula:
3.1 Experimental Dataset
The dataset in this paper is from the 2021 data of a photovoltaic power plant in a city in Jilin Province, China. The rated power generation of the solar panels is 200 W each, with a total of 50 photovoltaic panels, resulting in an overall installed capacity of 10 KW. Measurements are conducted at hourly intervals, and includes Wind speed, Wind direction, Ambient temperature, Pressure, Irradiance, Humidity, Panel temperature, and Photovoltaic power. The time granularity is selected as the resource monitoring situation of a machine in a power plant every 1 hour, with 8760 data. The model was divided into training set, validation set and, test set with the ratio of 6:1:3 for model training and testing. To verify the wide applicability of the model, five public datasets were used for comparison tests. The parameter settings for the VMD-Informer-DCC model are as shown in Table 1.
3.2 Analysis of the Results of Spearman's Feature Selector
Wind speed, Wind direction, Ambient temperature, Pressure, Irradiance, Humidity and Panel temperature are important features that affect photovoltaic power, but not all features have a positive impact on the Informer model, so feature screening work is needed. The features whose Spearman's rank correlation coefficient with Power is greater than 0.25 or less than −0.25 are filtered, and Table 2 shows that the retained features are Panel temperature, Ambient temperature, Humidity and Irradiance, where the Spearman's rank correlation coefficient between the Irradiance feature and the target Power is larger.
3.3 Analysis of VMD Results
Figure 5 shows the original amplitude volatility map of the feature Irradiance, and Fig. 6 shows the amplitude volatility map of the feature Irradiance after passing through the VMD layer of the encoder. The two figures respectively show the volatility trend of characteristic Irradiance before and after VMD transformation. Figure 5 shows a sharp change trend, and Fig. 6 shows a small change trend. This part only shows the bandwidth results before and after the Irradiance feature passes through the VMD layer, and the other features are processed in the same way. The prediction error of photovoltaic power is mainly determined by the prediction error of high-frequency modes. Compared with the direct prediction of photovoltaic original features, VMD can reduce the influence of non-stationarity of photovoltaic data on the prediction accuracy, reduce the complexity of photovoltaic power data to be predicted, and improve the prediction accuracy of the model.
3.4 Evaluation Index
Two indicators are mainly used to evaluate the performance of the prediction model, namely Mean Square Error (MSE) and Mean Absolute Error (MAE) [21], and the calculation method is shown in the Formulas (13), (14).
\begin{align*}
&MSE= \frac{1}{n}\sum\limits_{i=1}^{n}(\hat{y}_{i}-y_{i})^{2} \tag{13}\\
&MAE= \frac{1}{n}\sum\limits_{i=1}^{n}\vert \hat{y}_{i}-y_{i}\vert \tag{14}\end{align*}
In the formula:
3.5 Contrast Test
In the following, comparative experiments will be carried out from three aspects, namely prediction performance analysis, ablation experiment analysis and dataset difference analysis.
3.5.1 Prediction Performance Analysis
Table 3 shows the prediction performance comparison between VMD-Informer-DCC and various models such as ARIMA, LSTM, GRU, TCN and, Informer at different prediction steps. It can be seen that when the prediction step size is 24, 48, 96, 192, the prediction performance of the VMD-Informer-DCC model is better than other models when comparing MSE and MAE performance indicators, indicating that the VMD-Informer-DCC model has better prediction performance in multi-step prediction. Compared with the traditional Informer, the VMD-Informer-DCC model has higher prediction accuracy and the MSE is reduced by 21.8%(24), 16.0%(48), 15.5%(96), 16.3%(192), this shows that adding VMD and dilated causal convolution to Informer reduces data instability, increases the receptive field to contain more feature information, and improves the prediction effect of the power prediction model. Compared with ARIMA, LSTM, GRU and, TCN, the MAE of the VMD-Informer-DCC model is reduced by 28.8%(24), 29.3%(48), 26.5%(96) and 20.1%(192) on average.
Considering that it is difficult to accurately measure the difference between each model because of the dense display of the forecast results for a long time, Fig. 7 shows the power forecast results of a PV power station in Jilin on October 21 and October 22, 2021. Figure 8 shows the temporal correlation of the errors between the true and predicted values. It can be seen from the figure that the VMD-Informer-DCC model performs well in the power prediction of these two days, showing higher accuracy and stability compared to other prediction models.
3.5.2 Analysis of Ablation Experiments
To study the influence of each layer in the VMD-Informer-DCC model on the performance of photovoltaic power prediction, the ablation experiment was carried out based on the Informer model. The model with VMD layer was called VMD-Informer, and the model with dilated causal convolutional layer was called Informer-DCC, the model that adds VMD and dilated causal convolutional layer at the same time is VMD-Informer-DCC (the model in this paper). The performance comparison analysis of Informer, VMD-Informer, Informer-DCC and, VMD-Informer-DCC models under different conditions for MSE and MAE is shown in Table 4.
Table 4 shows that VMD-Informer and Informer-DCC have improved photovoltaic power prediction performance compared with Informer, which indicates that after adding VMD to Informer, the disadvantage of photovoltaic power series instability can be effectively remedied, to improve the prediction effect. Adding dilated causal convolutional layers to Informer can obtain more effective information, increase the receptive field of the model, and make the model get better prediction results. Nevertheless, with longer input sequences, an enlarged receptive field may lead to the loss of finer details during information extraction. As exemplified in Table 4, when the input step size is set to 192, the MAE of Informer outperforms the MAE value of Informer-DCC.
3.5.3 Experimental Analysis of Dataset Differences
To study the wide applicability of VMD-Informer-DCC model, five public time series forecasting datasets are used, including extreme abnormal event dataset (Hurricane), household electricity consumption dataset (Electricity), sales dataset (COVID-19), power transformer temperature dataset (ETT) and power dataset from the Godisara power plant in Tripura, India (Godisala) is compared with ARIMA, LSTM, GRU, TCN and Informer models. Table 5 shows the prediction MSE index table with input step 96 and prediction step 48.
As shown in Table 5, the VMD-Informer-DCC model performs well in covering the extreme abnormal event dataset (Hurricane), household electricity consumption dataset (Electricity), sales dataset (COVID-19), power transformer temperature dataset (ETT), and power dataset from the Godisara power plant in Tripura, India (Godisala) show excellent prediction effects. The model shows excellent forecasting performance in dealing with abnormal situations caused by natural disasters, household electricity consumption, sales trends, electrical equipment temperature and, power plant power.
Conclusions
In this paper, we propose a photovoltaic power prediction model based on VMD-Informer-DCC. Informer is improved by sequence decomposition and information stacking to reduce the volatility of sequence, expand the receptive field of information extraction and, ensure the causality of prediction. Experimental results show that the VMD-Informer-DCC model has high prediction accuracy, and it also has good results in other standard public datasets. The photovoltaic power model predicted in this study can help optimize the operation of solar power systems and promote the development of sustainable energy. In future work, the model will be further optimized to better extract the underlying features of time series data, improve the prediction accuracy, and try to reduce the computational complexity and memory usage of the Multi-head Self-attention mechanism in the model.
ACKNOWLEDGMENTS
This work was supported by the Science and Technology Development Plan of Jilin City (20230303013).