Introduction
Partial differential equations (PDEs) play an important role in science and engineering disciplines, and numerous physical problems are themselves PDEs, such as sound propagation, fluid flow, electromagnetic fields, etc.. Numerical solutions to PDEs have been developed over the past few decades and successfully applied to many real-world applications, i.e., aerodynamics and weather forecasting. These numerical methods, the most representative of which is the finite difference method (FDM) [1], first grid the solution region of the problem and then obtain an approximation of the exact solution at discrete grid points. Nevertheless, as the size and complexity of the problem increase, the computational complexity of the numerical methods becomes extremely large, and therefore, reducing the computational complexity of the numerical solution of PDEs has become a hot research topic in recent years.
With the renaissance of neural networks, data-driven deep learning methods have made breakthroughs on a variety of tasks [2], [3], [4], and as a result, some approaches [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20] have considered using deep learning models to reduce the computational complexity of solving PDEs, and have achieved encouraging results. Some methods [21], [22], [23], [24], [25], [26], [27], [28], [29] directly output numerical solutions of PDEs by training an end-to-end deep learning model with the conditions of the PDEs as input. The computational complexity of these methods depends entirely on the size of the network, but their accuracy and generalization are not guaranteed given the uninterpretability of the deep models, so these methods are not applicable in some fields where the accuracy of numerical solutions is strictly required.
To solve this problem, some of the methods [30], [31], [32], [33], [34] are built directly on top of existing hand-designed iterative methods [35], [36], [37], and they can inherit many qualities of existing iterative methods, such as the ability to obtain numerical solutions with arbitrary accuracy, a certain degree of generalization, etc., and even some of them [30], [34] carry out theoretical proofs. Specifically, they train a deep network that can correct the current iteration value by using the residual between the current iteration value and the last iteration value but using only the residual of two adjacent iterations to estimate the correction value is obviously not accurate enough. In this paper, we consider using the residual generated by all adjacent iterations to estimate the correction value, which can greatly improve the accuracy of the correction value and thus substantially speed up the convergence speed of the iterator. If we regard the residual analogously to the gradient, other methods are similar to the gradient descent method, while our method is similar to the momentum gradient descent method [38].
At the operational level, our method only applies the momentum trick compared to other methods, but at the theoretical level, the difference between our method and other methods is that most of the hand-designed iterative methods and deep learning-based iterative methods are by nature linear iterator [34], while ours is not. A linear iterator has a fixed update matrix (T) and bias (c). The purpose of the models trained by the deep learning-based method is to find a T and c that makes the iterator converge faster. The disadvantage of this class of iterators is that the bias c cannot be modified adaptively with iteration, limiting the flexibility of the iterator. However, our iterator no longer has a fixed bias due to the introduction of residuals generated by historical iteration values, and its bias can be continuously changed with iterations, enabling it to give a different bias for each iteration, which is more flexible and versatile, and also has faster convergence. We call our iterator an Unfixed Bias Iterator. In fact, the linear iterator is just a special case of an unfixed bias iterator.
Linear iterator is widely used in the industry despite its many drawbacks because it has sufficient theoretical guarantees to determine whether a new linear iterator converges and has generalization. This paper also provides sufficient theoretical guarantees for our proposed new iterative format: the unfixed bias iterator, and analyzes the convergence and generalization of our iterator. Finally, experimental results show that our method can converges much faster than other methods.
Our contributions are summarized as follows:
We introduce a new iterative format: the unfixed bias iterator, and provide sufficient theoretical guarantees for it, after which we prove theoretically that our iterator can converge and has some generalization.
We propose a novel iterator that can estimate the correction value of the current iteration value using the residuals generated by all adjacent iteration values. The correction value estimated by our iterator is more accurate compared to other methods, thus greatly improving the convergence speed of the iteration.
Experiments show that our iterator obtains SOTA performance in terms of convergence speed.
Related Work
A. End-to-End Methods
The end-to-end methods [21], [22], [23], [24], [25], [26], [27], [28], [29] usually train a neural network as the solver of PDEs, and different neural networks are also used for different PDEs. Reference [27] uses a fully convolutional Long Short-Term Memory (LSTM) [39] network to exploit the spatiotemporal dynamics of PDEs. The neural network serves to enhance finite-difference and finite-volume methods (FDM/FVM) that are commonly used to solve PDEs, allowing the method to maintain guarantees on the order of convergence. Meta-Auto-Decoder (MAD) [23] treats solving parametric PDEs as a meta-learning problem and utilizes the Auto-Decoder structure [40] to deal with different tasks/PDEs. Physics-informed losses induced from the PDEs governing equations and boundary conditions is used as the training losses for different tasks. The primary idea of [24] is to use graph neural network (GNN) [41] for modeling the spatial domain, and Neural ODE for modeling the temporal domain. The attention mechanism [42] identifies important inputs/features and assign more weightage to the same; this enhances the performance of the proposed framework. Using conditional generative adversarial networks (cGAN) [43], [44] trains models for the direct generation of solutions to steady state heat conduction and incompressible fluid flow purely on observation without knowledge of the underlying governing equations. End-to-end algorithms use mostly uninterpretable neural networks, so they can only verify the accuracy of their methods empirically rather than theoretically. However, our method provides a theoretical explanation.
B. Iterative Methods
Iterative methods [30], [31], [32], [33], [34] are built on top of existing hand-designed iterative methods, thus inheriting many of their advantages. A neural solver [30] to learn an optimal iterative scheme for a class of PDEs, in a data-driven fashion and attains this objective by modifying an iteration of an existing semi-implicit solver using a deep neural network. Multigrid Network (MgNet) [31] develops an unified model that simultaneously recovers some convolutional neural networks for image classification and multigrid (MG) [37] methods for solving discretized PDEs. Reference [32] learns a (single) mapping from a family of parameterized PDEs to prolongation operators and trains a neural network for the entire class of PDEs, using an efficient and unsupervised loss function. Reference [33] proposes a method using a reinforcement learning (RL) [45] agent based on graph neural networks [41], which can learn to perform graph coarsening on small training graphs and then be applied to unstructured large graphs in Multigrid. Reference [34] proposes an approach to learn a fast iterative solver tailored to a specific domain and achieves this goal by learning to modify the updates of an existing solver using a deep neural network. The iterative methods using nonlinear neural network can not guarantee the accuracy as the end-to-end method. Although the methods using only linear neural network can guarantee the accuracy in theory, they are essentially linear iterators, which limits their performance.
Methodology
A. Notations
The purpose of the linear PDEs solver is to find functions \begin{align*} \begin{cases} {\mathcal {A}}{\mathscr {u}}(x)={\mathscr {f}}(x), x\in {\mathcal {G}} \\ {\mathscr {u}}(x) = {\mathscr {b}}(x), x \in \partial {\mathcal {G}} \\ \end{cases} \tag {1}\end{align*}
\begin{align*} \begin{cases} G(Au) & = Gf \\ (1-G)u & = (1-G)b \end{cases} \tag {2}\end{align*}
B. Iterator and Training
The difference between our iterator and other iterators at the operational level is that we apply the residuals generated by all two adjacent iterations to each estimated correction, this idea inspired by the great success of momentum gradient descent in the field of deep model optimization. Specifically, for a fixed PDE problem class A, let \begin{align*} \begin{cases} w_{i+1} & = \varphi (u_{i};G,f,b,n) - u_{i} \\ z_{i+1} & = \theta GHw_{i+1} + (1 - \theta )z_{i} \\ u_{i+1} & = \varphi (u_{i};G,f,b,n) + z_{i+1} \\ \end{cases} \tag {3}\end{align*}
We train our iterator \begin{equation*} \min _{H} \mathbb {E} ||\psi _{H}^{l}(u_{0};G,f,b,n)-u_{*}||^{2}_{2}, \tag {4}\end{equation*}
C. Theoretical Analysis
Since our iterator is no longer a linear iterator, we first define a new iterator format: unfixed bias iterator and give sufficient conditions for its convergence, and finally discuss the accuracy and generalization of our iterator at the theoretical level.
1) Unfixed Bias Iterator
As before, denote \begin{align*} \psi _{H}(u_{i}) & = \varphi (u_{i}) + z_{i+1} \\ & = Tu_{i} + c + \theta GH(Tu_{i}+c-u_{i}) + (1-\theta )z_{i} \\ & = (T+\theta GHT - \theta GH)u_{i} + c + \theta GHc \\ & \quad + \theta \sum _{j=1}^{i}(1-\theta )^{i-j}Hw_{j}. \tag {5}\end{align*}
Definition 1 (Unfixed Bias Iterator):
An unfixed bias iterator is a function \begin{equation*} u_{i+1}=\phi (u_{i})=Tu_{i}+c_{i}, \tag {6}\end{equation*}
Whether an unfixed bias iterator converges depends on T and
Theorem 1:
An unfixed bias iterator
Proof:
Necessity: If \begin{align*} u_{k+1} - u_{*} & = T(u_{k} - u_{*}) + (c_{k} - c_{*}) \\ & = T^{2}(u_{k-1} - u_{*}) + T(c_{k-1}-c_{*}) + (c_{k} - c_{*}) \\ & \vdots \\ & =T^{k+1}(u_{0}-u_{*}) + \sum _{i=0}^{k}T^{k-i}(c_{i}-c_{*}) \tag {7}\end{align*}
\begin{equation*} \lim _{n\to \infty }T^{n}=O \Rightarrow \rho (T)\lt 1. \tag {8}\end{equation*}
\begin{equation*} \lim _{n\to \infty }(c_{n}-c_{*})=0 \Rightarrow \lim _{n\to \infty }c_{n} = c_{*}. \tag {9}\end{equation*}
\begin{align*} \lim _{n\to \infty }(u_{n} - u_{*}) & = \lim _{n\to \infty }T^{n}(u_{0}-u_{*}) \\ & \quad + \lim _{n\to \infty }\sum _{i=0}^{n-1}T^{n-1-i}(c_{i}-c_{*}) \\ & =0 + 0 = 0 \\ & \Rightarrow \lim _{n\to \infty }u_{n} = u_{*}. \tag {10}\end{align*}
2) Accuracy
The correct PDE solution is a fixed point of
Theorem 2:
For any PDE problem
Proof:
When
When
When \begin{equation*} \lim _{n\to \infty }w_{n}=0. \tag {11}\end{equation*}
\begin{equation*} \lim _{n\to \infty }Hw_{n}=0. \tag {12}\end{equation*}
\begin{align*} z_{n} & = \theta Hw_{n}+(1-\theta )z_{n-1} \\ & = \theta Hw_{n}+(1-\theta )(\theta Hw_{n-1}+(1-\theta )z_{n-2}) \\ & \vdots \\ & = \theta \sum _{i=0}^{n}(1-\theta )^{n-i}Hw_{i} \tag {13}\end{align*}
\begin{equation*} \lim _{n\to \infty }z_{n} = \theta \lim _{n\to \infty }\sum _{i=0}^{n} (1-\theta )^{n-i}Hw_{i} = 0. \tag {14}\end{equation*}
\begin{align*} \lim _{n\to \infty }\psi ^{n}(u_{0}) & = \lim _{n\to \infty }(\varphi ^{n}(u_{0}) + z_{n+1}) \\ & = \lim _{n\to \infty }\varphi ^{n}(u_{0}) + \lim _{n\to \infty }z_{n} \\ & = u_{*} + 0 = u_{*} \tag {15}\end{align*}
For any PDE problem, Theorem 2 points out that once our iterator converges, the fixed point obtained by our iterator must be accuracy. This means that our iterator can obtain solutions with arbitrary precision just as well as the hand-designed iterator.
3) Generalization
For the PDE problem
Theorem 3:
For fixed
Proof:
From Theorem 1 and Theorem 2, our iterator is accuracy if and only if the following conditions holds:\begin{align*} \begin{cases} \rho (T+\theta GHT - \theta GH)\lt 1 \\ \lim _{n\to \infty }\left ({{c + \theta GHc + \theta \sum _{i=1}^{n}H(1-\theta )^{n-i}w_{i}}}\right )=c_{*} \end{cases}\tag {16}\end{align*}
For a fixed \begin{align*} & \lim _{n\to \infty }\left ({{c + \theta GHc + \theta \sum _{i=1}^{n}H(1-\theta )^{n-i}w_{i}}}\right ) \\ & = c + \theta GHc + \lim _{n\to \infty }\left ({{\theta \sum _{i=1}^{n}H(1-\theta )^{n-i}w_{i}}}\right ) \\ & = c + \theta GHc + 0 = c + \theta GHc. \tag {17}\end{align*}
In summary, the accuracy of the iterator is independent with f and b. Thus, if the iterator is accuracy for one
Theorem 3 states that our iterator can be freely generalized to different f and b. There is no guarantee that our iterator can generalize to different G and n. Generalization to different G and n has to be empirically verified: in our experiments, our learned iterator converges to the correct solution for a variety of grid sizes n and geometries G, and we have not observed that any
Even if some
Experiments
A. Experimental Setting
For fair comparison, we follow [34] to prepare our experimental setting, including data set, evaluation criterion and the implementation of H.
1) Data Set
To reemphasize, our goal is to train a model on simple domains where the ground truth solutions can be easily obtained, and then evaluate its performance on more complex geometries and boundary conditions. For training, we select the simplest homogeneous equation on a 2-dimensional square domain (
For testing, we use larger grid sizes n than training. For example, choose
2) Implementation of $H$
We use convolution to realize A. Similarly, we use the stack of convolutional layers or U-Net [47] with linear operation only to implement H, and different implementation methods use different data sets. For the implementation of convolutional layer stacking, we call it Unfixed-Bias-ConvX model. The model is trained on the square domain with
For the implementation of U-Net, we call it Unfixed-Bias-U-NetX model. The model is trained on the square domain with
3) Evaluation Criterion
For Unfixed-Bias-Conv models, we compare them against Jacobi method. On GPU, the Jacobi iterator and our model can both be efficiently implemented as convolution layers. Thus, we measure the computation cost by the number of convolutional layers. Suppose Jacobi method converges after N iterations, and our model converges after M iterations. Then the evaluation criteria layers is calculated as follows:\begin{equation*} layers = \frac {M(1+L)}{N}, \tag {18}\end{equation*}
On CPU, the Jacobi iteration requires 4 multiply-add operations, while a \begin{equation*} ops = \frac {M(4+2+9L)}{4N}. \tag {19}\end{equation*}
For Unfixed-Bias-U-Net models, we compare them against Multigrid method with the same number of sub-sampling and smoothing layers. Therefore, our models have the same number of convolutional layers, and roughly
B. Numerical Experiments
We have validated the effectiveness of our iterator in solving different PDEs.
1) Possion Equation
The Possion Equation is written as:\begin{equation*} \nabla ^{2}{\mathscr {u}} = {\mathscr {f}} \tag {20}\end{equation*}
For Unfixed-Bias-U-NetX models, they also converge to the correct solution, and require less computation than Multigrid in all settings. Our Unfixed-Bias-U-Net3 is
According to Theorem 1, if our iterator converges for a geometry, then it is guaranteed to converge to the correct solution for any f and boundary values b. The experiment results show that our model not only converges but also converges faster than the standard solver in a variety of grid sizes n and geometries G. Empirically, this also shows that our method has strong generalization.
2) Helmholtz Equation
The Helmholtz Equation is written as:\begin{equation*} \nabla ^{2}{\mathscr {u}} + a^{2} {\mathscr {u}}={\mathscr {f}} \tag {21}\end{equation*}
3) Heat Conduction Equation
The Heat Conduction Equation is written as:\begin{equation*} \frac {\partial {\mathscr {u}}}{\partial t} - a\nabla ^{2}{\mathscr {u}} = {\mathscr {f}}. \tag {22}\end{equation*}
In essence, using Poisson Equation, Helmholtz Equation and Heat Conduction Equation to verify the effectiveness of our method is changing A. Although we need to retrain a model for different A, it also shows that our method can be easily extended to different PDEs, and can obtain an ideal performance. In the future, we will consider training a model that can be generalized to different A.
C. The Sensitivity of Hyper-Parameters
In this section, we discuss the influence of the hyper-parameter
1) Hyper-Parameter $\theta $
Taking models Unfixed-Bias-Conv2 and Unfixed-Bias-U-Net3 as examples, we test their sensitivity to
2) Number of Iterations $l$
When we change the value range of iteration number k, no obvious change in the convergence speed of the model is observed. Only in some extreme settings, for example,
D. Errors
Figure 2 shows that the errors decrease with iteration under different iterators. Obviously, our iterator can reduce high-frequency errors at a very fast speed. After reducing the errors to a certain extent, the speed of the Jacobi iterator to reduce the low-frequency errors is much lower than our iterator. From the speed of error reduction, we can see why our method can achieve much faster convergence than the Jacobi iterator, which further explains the superiority of our iterator.
E. Visualization
We design some challenging geometries to test the generalization of our models: (i) same geometry but larger grid, (ii) L-shape geometry, (iii) Cylinders geometry, and (iv) PDEs in the same geometry, but
Conclusion
We build a learned iterator on top of an existing standard iterative solver, which can modify the current iteration result with the help of the historical iteration results, so as to accelerate the convergence speed. At the theoretical level, due to the introduction of the historical iterative results, our iterator is a new iterative format: Unfixed Bias Iterator: the unfixed bias iterator, and we provide sufficient theoretical guarantees for it, after which we prove theoretically that our iterator can converge and has some generalization. Experimental results show that even if our solver is trained only in simple domains, it can generalize to different grid sizes, geometries and boundary conditions. Moreover, the powerful generalization of our method is illustrated by solving for different PDEs. Last but not least, our iterator greatly exceeds the existing iterators in convergence speed.
AppendixLemmas and Proofs
Lemmas and Proofs
This chapter gives the lemmas we use and their proofs.
Lemma 1:
Proof:
Since \begin{align*} \lim _{n\to \infty }\sum _{i=0}^{n}B^{n-i}a_{i} & = \lim _{n\to \infty }\left ({{\sum _{i=0}^{n-1}B^{n-i}a_{i} + a_{n}}}\right ) = 0 \\ & \Rightarrow \lim _{n\to \infty }\sum _{i=0}^{n-1}B^{n-i}a_{i} = -\lim _{n\to \infty }a_{n} \tag {23}\end{align*}
\begin{align*} a_{n} & = -\sum _{i=0}^{n-1}B^{n-i}a_{i}, \tag {24}\\ a_{n+1} & = -\sum _{i=0}^{n}B^{n+1-i}a_{i} \\ & = -\sum _{i=0}^{n-1}B^{n+1-i}a_{i} - Ba_{n} \\ & = -\sum _{i=0}^{n-1}B^{n+1-i}a_{i} + B\sum _{i=0}^{n-1}B^{n-i}a_{i} \\ & = -\sum _{i=0}^{n-1}B^{n+1-i}a_{i} + \sum _{i=0}^{n-1}B^{n+1-i}a_{i} = 0 \\ & \Rightarrow \lim _{n\to \infty }a_{n}=0. \tag {25}\end{align*}
Lemma 2:
Proof:
Let \begin{equation*} ||a_{m}||\lt \epsilon. \tag {26}\end{equation*}
\begin{equation*} ||\sum _{i=0}^{n}B^{n-i}a_{i}|| \leq ||\sum _{i=0}^{N_{1}}B^{n-i}a_{i}|| + ||\sum _{i=N_{1}}^{n}B^{n-i}a_{i}||. \tag {27}\end{equation*}
\begin{align*} ||\sum _{i=N_{1}}^{n}B^{n-i}a_{i}|| & \lt \epsilon ||\sum _{i=N_{1}}^{n}B^{n-i}|| = \epsilon ||\sum _{i=0}^{n-N_{1}-1}B^{i}|| \\ & = \epsilon ||(I-B)^{-1}(I-B^{n-N_{1}})|| \\ & \leq \epsilon ||(I-B)^{-1}|| \tag {28}\end{align*}
\begin{equation*} ||\sum _{i=0}^{N_{1}}B^{n-i}a_{i}|| \leq ||\sum _{i=0}^{N_{1}}B^{n-i}a_{*}|| \lt N_{1}||B^{n-N_{1}}a_{*}||. \tag {29}\end{equation*}
\begin{equation*} N_{1}||B^{m-N_{1}}a_{*}|| \lt \epsilon. \tag {30}\end{equation*}
\begin{equation*} |\sum _{i=0}^{m}B^{m-i}a_{i}| \lt \epsilon (||(I-B)^{-1}|| + 1). \tag {31}\end{equation*}
Lemma 3:
Proof:
Let \begin{equation*} |\alpha _{m}|\lt \epsilon. \tag {32}\end{equation*}
\begin{equation*} |\sum _{i=0}^{n}\beta ^{n-i}\alpha _{i}| \leq |\sum _{i=0}^{N_{1}}\beta ^{n-i}\alpha _{i}| + |\sum _{i=N_{1}}^{n}\beta ^{n-i}\alpha _{i}|. \tag {33}\end{equation*}
\begin{align*} |\sum _{i=N_{1}}^{n}\beta ^{n-i}\alpha _{i}| & \lt \epsilon |\sum _{i=N_{1}}^{n}\beta ^{n-i}| = \epsilon |\sum _{i=0}^{n-N_{1}-1}\beta ^{i}| \\ & = \epsilon \frac {1-\beta ^{n-N_{1}-1}}{1-\beta } \lt \frac {\epsilon }{1-\beta }. \tag {34}\end{align*}
\begin{equation*} |\sum _{i=0}^{N_{1}}\beta ^{n-i}\alpha _{i}| \leq |\alpha _{*}\sum _{i=0}^{N_{1}}\beta ^{n-i}| \lt |\alpha _{*}N_{1}\beta ^{n-N_{1}}|. \tag {35}\end{equation*}
\begin{equation*} |\alpha _{*}N_{1}\beta ^{m-N_{1}}| \lt \epsilon. \tag {36}\end{equation*}
\begin{equation*} |\sum _{i=0}^{m}\beta ^{m-i}\alpha _{i}| \lt \frac {\epsilon }{1-\beta } + \epsilon. \tag {37}\end{equation*}
Lemma 4:
Proof:
Since \begin{equation*} a_{i} = (a_{i}^{(1)}, a_{i}^{(2)}, \cdots, a_{i}^{(k)}). \tag {38}\end{equation*}
\begin{equation*} \left ({{\sum _{i=0}^{n}\beta ^{n-i}a_{i}^{(1)}, \sum _{i=0}^{n}\beta ^{n-i}a_{i}^{(2)}, \cdots, \sum _{i=0}^{n}\beta ^{n-i}a_{i}^{(k)}}}\right ) \tag {39}\end{equation*}
Therefore,