Journals & Magazines >IEEE Access >Volume: 13

Unfixed Bias Iterator: A New Iterative Format

The proposed iterators use deep neural networks to obtain the calibration results of the current iteration, and accumulate the calibration results of each iteration as th...

Abstract:

Partial differential equations (PDEs) have a wide range of applications in physics and computational science. Solving PDEs numerically is usually done by first meshing th...Show More

Metadata

Abstract:

Partial differential equations (PDEs) have a wide range of applications in physics and computational science. Solving PDEs numerically is usually done by first meshing the solution region with finite difference method (FDM) and then using iterative methods to obtain an approximation of the exact solution on these meshes, hence decades of research to design iterators with fast convergence properties. With the renaissance of neural networks, many scholars have considered using deep learning to speed up solving PDEs, however, these methods leave poor theoretical guarantees or sub-convergence. We build our iterator on top of the existing standard hand-crafted iterative solvers. At the operational level, for each iteration, we use a deep convolutional network to modify the current iterative result based on the historical iterative results as a way to achieve faster convergence. At the theoretical level, due to the introduced historical iterative results, our iterator is a new iterative format: Unfixed Bias Iterator. We provide sufficient theoretical guarantees and theoretically prove that our iterator can obtain correct results with convergence, as well as a better generalization. Finally, experimental results show that our iterator has a convergence speed far beyond that of other iterators and exhibits strong generalization ability.

The proposed iterators use deep neural networks to obtain the calibration results of the current iteration, and accumulate the calibration results of each iteration as th...

Published in: IEEE Access ( Volume: 13)

Page(s): 23472 - 23481

Date of Publication: 27 January 2025

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2025.3534257

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Partial differential equations (PDEs) play an important role in science and engineering disciplines, and numerous physical problems are themselves PDEs, such as sound propagation, fluid flow, electromagnetic fields, etc.. Numerical solutions to PDEs have been developed over the past few decades and successfully applied to many real-world applications, i.e., aerodynamics and weather forecasting. These numerical methods, the most representative of which is the finite difference method (FDM) [1], first grid the solution region of the problem and then obtain an approximation of the exact solution at discrete grid points. Nevertheless, as the size and complexity of the problem increase, the computational complexity of the numerical methods becomes extremely large, and therefore, reducing the computational complexity of the numerical solution of PDEs has become a hot research topic in recent years.

With the renaissance of neural networks, data-driven deep learning methods have made breakthroughs on a variety of tasks [2], [3], [4], and as a result, some approaches [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20] have considered using deep learning models to reduce the computational complexity of solving PDEs, and have achieved encouraging results. Some methods [21], [22], [23], [24], [25], [26], [27], [28], [29] directly output numerical solutions of PDEs by training an end-to-end deep learning model with the conditions of the PDEs as input. The computational complexity of these methods depends entirely on the size of the network, but their accuracy and generalization are not guaranteed given the uninterpretability of the deep models, so these methods are not applicable in some fields where the accuracy of numerical solutions is strictly required.

To solve this problem, some of the methods [30], [31], [32], [33], [34] are built directly on top of existing hand-designed iterative methods [35], [36], [37], and they can inherit many qualities of existing iterative methods, such as the ability to obtain numerical solutions with arbitrary accuracy, a certain degree of generalization, etc., and even some of them [30], [34] carry out theoretical proofs. Specifically, they train a deep network that can correct the current iteration value by using the residual between the current iteration value and the last iteration value but using only the residual of two adjacent iterations to estimate the correction value is obviously not accurate enough. In this paper, we consider using the residual generated by all adjacent iterations to estimate the correction value, which can greatly improve the accuracy of the correction value and thus substantially speed up the convergence speed of the iterator. If we regard the residual analogously to the gradient, other methods are similar to the gradient descent method, while our method is similar to the momentum gradient descent method [38].

At the operational level, our method only applies the momentum trick compared to other methods, but at the theoretical level, the difference between our method and other methods is that most of the hand-designed iterative methods and deep learning-based iterative methods are by nature linear iterator [34], while ours is not. A linear iterator has a fixed update matrix (T) and bias (c). The purpose of the models trained by the deep learning-based method is to find a T and c that makes the iterator converge faster. The disadvantage of this class of iterators is that the bias c cannot be modified adaptively with iteration, limiting the flexibility of the iterator. However, our iterator no longer has a fixed bias due to the introduction of residuals generated by historical iteration values, and its bias can be continuously changed with iterations, enabling it to give a different bias for each iteration, which is more flexible and versatile, and also has faster convergence. We call our iterator an Unfixed Bias Iterator. In fact, the linear iterator is just a special case of an unfixed bias iterator.

Linear iterator is widely used in the industry despite its many drawbacks because it has sufficient theoretical guarantees to determine whether a new linear iterator converges and has generalization. This paper also provides sufficient theoretical guarantees for our proposed new iterative format: the unfixed bias iterator, and analyzes the convergence and generalization of our iterator. Finally, experimental results show that our method can converges much faster than other methods.

Our contributions are summarized as follows:

We introduce a new iterative format: the unfixed bias iterator, and provide sufficient theoretical guarantees for it, after which we prove theoretically that our iterator can converge and has some generalization.
We propose a novel iterator that can estimate the correction value of the current iteration value using the residuals generated by all adjacent iteration values. The correction value estimated by our iterator is more accurate compared to other methods, thus greatly improving the convergence speed of the iteration.
Experiments show that our iterator obtains SOTA performance in terms of convergence speed.

SECTION II.

Related Work

A. End-to-End Methods

The end-to-end methods [21], [22], [23], [24], [25], [26], [27], [28], [29] usually train a neural network as the solver of PDEs, and different neural networks are also used for different PDEs. Reference [27] uses a fully convolutional Long Short-Term Memory (LSTM) [39] network to exploit the spatiotemporal dynamics of PDEs. The neural network serves to enhance finite-difference and finite-volume methods (FDM/FVM) that are commonly used to solve PDEs, allowing the method to maintain guarantees on the order of convergence. Meta-Auto-Decoder (MAD) [23] treats solving parametric PDEs as a meta-learning problem and utilizes the Auto-Decoder structure [40] to deal with different tasks/PDEs. Physics-informed losses induced from the PDEs governing equations and boundary conditions is used as the training losses for different tasks. The primary idea of [24] is to use graph neural network (GNN) [41] for modeling the spatial domain, and Neural ODE for modeling the temporal domain. The attention mechanism [42] identifies important inputs/features and assign more weightage to the same; this enhances the performance of the proposed framework. Using conditional generative adversarial networks (cGAN) [43], [44] trains models for the direct generation of solutions to steady state heat conduction and incompressible fluid flow purely on observation without knowledge of the underlying governing equations. End-to-end algorithms use mostly uninterpretable neural networks, so they can only verify the accuracy of their methods empirically rather than theoretically. However, our method provides a theoretical explanation.

B. Iterative Methods

Iterative methods [30], [31], [32], [33], [34] are built on top of existing hand-designed iterative methods, thus inheriting many of their advantages. A neural solver [30] to learn an optimal iterative scheme for a class of PDEs, in a data-driven fashion and attains this objective by modifying an iteration of an existing semi-implicit solver using a deep neural network. Multigrid Network (MgNet) [31] develops an unified model that simultaneously recovers some convolutional neural networks for image classification and multigrid (MG) [37] methods for solving discretized PDEs. Reference [32] learns a (single) mapping from a family of parameterized PDEs to prolongation operators and trains a neural network for the entire class of PDEs, using an efficient and unsupervised loss function. Reference [33] proposes a method using a reinforcement learning (RL) [45] agent based on graph neural networks [41], which can learn to perform graph coarsening on small training graphs and then be applied to unstructured large graphs in Multigrid. Reference [34] proposes an approach to learn a fast iterative solver tailored to a specific domain and achieves this goal by learning to modify the updates of an existing solver using a deep neural network. The iterative methods using nonlinear neural network can not guarantee the accuracy as the end-to-end method. Although the methods using only linear neural network can guarantee the accuracy in theory, they are essentially linear iterators, which limits their performance.

SECTION III.

Methodology

A. Notations

The purpose of the linear PDEs solver is to find functions ${\mathscr {u}}$ that satisfy a series of partial differential equations and boundary conditions, as follows:\begin{align*} \begin{cases} {\mathcal {A}}{\mathscr {u}}(x)={\mathscr {f}}(x), x\in {\mathcal {G}} \\ {\mathscr {u}}(x) = {\mathscr {b}}(x), x \in \partial {\mathcal {G}} \\ \end{cases} \tag {1}\end{align*} View Sourcewhere ${\mathscr {u}} \in {\mathcal {F}}=\{{\mathscr {u}}:\mathbb {R}^{k}\rightarrow \mathbb {R}\}$ is the equation to be solved, ${\mathcal {A}}:{\mathcal {F}}\rightarrow {\mathcal {F}}$ is a linear operator, ${\mathscr {f}}:\mathbb {R}^{k}\rightarrow \mathbb {R}$ and ${\mathscr {b}}:\mathbb {R}^{k}\rightarrow \mathbb {R}$ are a function that can be obtained naturally according to the problem, and ${\mathcal {G}}$ is the domain of definition of the function to be solved. The problem domain of PDEs is discretized into an $n\times n\times \cdots \times n$ (k many) uniform Cartesian grid with mesh width h by using FDM [1], the solution of the following equation at discrete grid points is an approximation of the solution of Eq. 1:\begin{align*} \begin{cases} G(Au) & = Gf \\ (1-G)u & = (1-G)b \end{cases} \tag {2}\end{align*} View Sourcewhere $G, A \in \mathbb {R}^{n^{k}\times n^{k}}$ the matrices of ${\mathcal {G}}$ and ${\mathcal {A}}$ after discretization, and $u, b$ and $f \in \mathbb {R}^{n^{k}}$ the vectors of ${\mathscr {u}}$ , ${\mathscr {b}}$ and ${\mathscr {f}}$ after discretization. Therefore, a discretized PDE problem consists of five components $(A,G,f,b,n)$ . In this paper, we fix A but vary $G,f,b,n$ , and are interested in learning an iterator that solves a class of PDE problems governed by the same A.

B. Iterator and Training

The difference between our iterator and other iterators at the operational level is that we apply the residuals generated by all two adjacent iterations to each estimated correction, this idea inspired by the great success of momentum gradient descent in the field of deep model optimization. Specifically, for a fixed PDE problem class A, let $\varphi $ be a standard iterative solver, such as Jacobi iterator. We will use more formal notation $\varphi (u; G,f,b,n)$ as $\varphi $ that is a function of u, but also depends on $G, f, b, n$ . We rely on $\varphi $ to design our new iterator $\psi _{H}:\mathbb {R}^{n^{k}}\rightarrow \mathbb {R}^{n^{k}}$ as:\begin{align*} \begin{cases} w_{i+1} & = \varphi (u_{i};G,f,b,n) - u_{i} \\ z_{i+1} & = \theta GHw_{i+1} + (1 - \theta )z_{i} \\ u_{i+1} & = \varphi (u_{i};G,f,b,n) + z_{i+1} \\ \end{cases} \tag {3}\end{align*} View Sourcewhere $z_{0}=0$ , $u_{0}$ is an arbitrary initial vector, H is a learned linear operation and $\theta \in [{0,1}]$ is a hyper-parameter. If $\theta =0$ , our iterator degenerates to Jacobi iterator. If $\theta =1$ , our iterator degenerates to the iterator in [34]. Therefore, they are special cases of our iterator. In our experiments, $\theta $ is fixed to 0.8. When there is no confusion, we neglect the dependence on $G,f,b,n$ and denote $\varphi (u;G,f,b,n)$ as $\varphi (u)$ , and $\psi _{H}(u;G,f,b,n)$ as $\psi _{H}(u)$ .

We train our iterator $\psi _{H}(u;G,f,b,n)$ to converge quickly to the ground truth solution on a set of problem instances. For each instance, the ground truth solution $u_{*}$ is obtained from the existing solver $\varphi $ . The loss function is:\begin{equation*} \min _{H} \mathbb {E} ||\psi _{H}^{l}(u_{0};G,f,b,n)-u_{*}||^{2}_{2}, \tag {4}\end{equation*} View Sourcewhere $u_{0}\sim N(0,1)$ . For the training data of each batch, the number of iterations l may be different. In our experiment, l is uniformly chosen from [1, 20].

C. Theoretical Analysis

Since our iterator is no longer a linear iterator, we first define a new iterator format: unfixed bias iterator and give sufficient conditions for its convergence, and finally discuss the accuracy and generalization of our iterator at the theoretical level.

1) Unfixed Bias Iterator

As before, denote $\varphi (u_{i})=Tu_{i} + c$ . Observe that:\begin{align*} \psi _{H}(u_{i}) & = \varphi (u_{i}) + z_{i+1} \\ & = Tu_{i} + c + \theta GH(Tu_{i}+c-u_{i}) + (1-\theta )z_{i} \\ & = (T+\theta GHT - \theta GH)u_{i} + c + \theta GHc \\ & \quad + \theta \sum _{j=1}^{i}(1-\theta )^{i-j}Hw_{j}. \tag {5}\end{align*} View SourceSince the offset $\left ({{c+\theta GHc+\theta \sum _{j=1}^{i}(1-\theta)^{i-j}Hw_{j}}}\right)$ is no longer a constant vector, our iterator is a completely new format, which we call it unfixed bias iterator:

Definition 1 (Unfixed Bias Iterator):

An unfixed bias iterator is a function $\phi: \mathbb {R}^{n^{k}}\rightarrow \mathbb {R}^{n^{k}}$ that can be expressed as:\begin{equation*} u_{i+1}=\phi (u_{i})=Tu_{i}+c_{i}, \tag {6}\end{equation*} View Sourcewhere T is called the update matrix, and $c_{i}$ is a unfixed bias vector.

Whether an unfixed bias iterator converges depends on T and $c_{i}$ , and has the following theorem:

Theorem 1:

An unfixed bias iterator $\phi $ converges to a unique fixed point from any initial vector if and only if the spectral radius $\rho (T)\lt 1$ and $\lim _{n\to \infty }c_{n}=c_{*}$ .

Proof:

Necessity: If $u_{*}$ is a fixed point, then $u_{*}=Tu_{*}+c_{*}$ . Observe that:\begin{align*} u_{k+1} - u_{*} & = T(u_{k} - u_{*}) + (c_{k} - c_{*}) \\ & = T^{2}(u_{k-1} - u_{*}) + T(c_{k-1}-c_{*}) + (c_{k} - c_{*}) \\ & \vdots \\ & =T^{k+1}(u_{0}-u_{*}) + \sum _{i=0}^{k}T^{k-i}(c_{i}-c_{*}) \tag {7}\end{align*} View SourceBy the arbitrariness of $u_{0}$ , we have that:\begin{equation*} \lim _{n\to \infty }T^{n}=O \Rightarrow \rho (T)\lt 1. \tag {8}\end{equation*} View SourceAccording to Lemma 1, we can infer that:\begin{equation*} \lim _{n\to \infty }(c_{n}-c_{*})=0 \Rightarrow \lim _{n\to \infty }c_{n} = c_{*}. \tag {9}\end{equation*} View SourceSufficiency: If $\rho (T)\lt 1$ , then $|I-T|\neq 0$ , then the equation $(I-T)u=c_{*}$ has a unique solution, which is recorded as $u_{*}$ . For any $u_{0}$ , according to Lemma 2, we have that:\begin{align*} \lim _{n\to \infty }(u_{n} - u_{*}) & = \lim _{n\to \infty }T^{n}(u_{0}-u_{*}) \\ & \quad + \lim _{n\to \infty }\sum _{i=0}^{n-1}T^{n-1-i}(c_{i}-c_{*}) \\ & =0 + 0 = 0 \\ & \Rightarrow \lim _{n\to \infty }u_{n} = u_{*}. \tag {10}\end{align*} View Source□

2) Accuracy

The correct PDE solution is a fixed point of $\psi _{H}$ by the following theorem:

Theorem 2:

For any PDE problem $(A,G,f,b,n)$ and choice of H, $u_{*}=\lim _{n\to \infty }\psi _{H}^{n}(u_{0})$ , if $u_{*}=\lim _{n\to \infty }\varphi ^{n}(u_{0})$ and $\lim _{x\to 0}Hx=0$ .

Proof:

When $\theta =0$ , our iterator degenerates into a Jacobi iterator, and the theorem obviously holds.

When $\theta =1$ , our iterator degenerates into the iterator in [34], and the theorem has been proved by [34].

When $0\lt \theta \lt 1$ , since $u_{*}=\lim _{n\to \infty }\psi _{H}^{n}(u_{0})$ , according to Cauchy criterion, we have that:\begin{equation*} \lim _{n\to \infty }w_{n}=0. \tag {11}\end{equation*} View SourceBecause of $\lim _{x\to 0}Hx=0$ , we know that:\begin{equation*} \lim _{n\to \infty }Hw_{n}=0. \tag {12}\end{equation*} View SourceIf we define $w_{0}=0$ , we can obtain:\begin{align*} z_{n} & = \theta Hw_{n}+(1-\theta )z_{n-1} \\ & = \theta Hw_{n}+(1-\theta )(\theta Hw_{n-1}+(1-\theta )z_{n-2}) \\ & \vdots \\ & = \theta \sum _{i=0}^{n}(1-\theta )^{n-i}Hw_{i} \tag {13}\end{align*} View SourceSince $0\lt (1-\theta)\lt 1$ and $\lim _{n\to \infty }Hw_{n}=0$ , according to Lemma 4, we can deduce that:\begin{equation*} \lim _{n\to \infty }z_{n} = \theta \lim _{n\to \infty }\sum _{i=0}^{n} (1-\theta )^{n-i}Hw_{i} = 0. \tag {14}\end{equation*} View SourceTo sum up, we have that:\begin{align*} \lim _{n\to \infty }\psi ^{n}(u_{0}) & = \lim _{n\to \infty }(\varphi ^{n}(u_{0}) + z_{n+1}) \\ & = \lim _{n\to \infty }\varphi ^{n}(u_{0}) + \lim _{n\to \infty }z_{n} \\ & = u_{*} + 0 = u_{*} \tag {15}\end{align*} View Source□

For any PDE problem, Theorem 2 points out that once our iterator converges, the fixed point obtained by our iterator must be accuracy. This means that our iterator can obtain solutions with arbitrary precision just as well as the hand-designed iterator.

3) Generalization

For the PDE problem $(A,G,f,b,n)$ , in the case of fixed A, even if we can only use a limited combination of $(G,f,b,n)$ to train our model, the model shows surprising generalization properties, which we show in Theorem 3:

Theorem 3:

For fixed $A,G,n$ and fixed H, if for one $f_{0},b_{0}$ , $\psi _{H}(u;G,f_{0},b_{0},n)$ is accuracy for the PDE problem $(A,G,f_{0},b_{0},n)$ , then for all f and b, the iterator $\psi _{H}(u;G,f,b,n)$ is accuracy for the PDE problem $(A,G,f,b,n)$ .

Proof:

From Theorem 1 and Theorem 2, our iterator is accuracy if and only if the following conditions holds:\begin{align*} \begin{cases} \rho (T+\theta GHT - \theta GH)\lt 1 \\ \lim _{n\to \infty }\left ({{c + \theta GHc + \theta \sum _{i=1}^{n}H(1-\theta )^{n-i}w_{i}}}\right )=c_{*} \end{cases}\tag {16}\end{align*} View SourceSince T depends on $A, G, n$ , and c depends on $A, G, f, b, n$ , it follows that the establishment of $\rho (T+\theta GHT - \theta GH)\lt 1$ only depends on $A, G, n$ and has nothing to do with $f, b$ .

For a fixed $A, G, n$ such that $\psi _{H}(u;G,f_{0},b_{0},n)$ is accuracy, there must be $\rho (T)\lt 1$ , then we have $u_{*}=\lim _{n\to \infty }\varphi ^{n}(u_{0})$ for all $f, b$ . Therefore, according to the proof of Theorem 2, we have that:\begin{align*} & \lim _{n\to \infty }\left ({{c + \theta GHc + \theta \sum _{i=1}^{n}H(1-\theta )^{n-i}w_{i}}}\right ) \\ & = c + \theta GHc + \lim _{n\to \infty }\left ({{\theta \sum _{i=1}^{n}H(1-\theta )^{n-i}w_{i}}}\right ) \\ & = c + \theta GHc + 0 = c + \theta GHc. \tag {17}\end{align*} View SourceSince the above equation holds for all $f, b$ , that is, the establishment of $\lim _{n\to \infty }\left ({{c + \theta GHc + \theta \sum _{i=1}^{n}H(1-\theta)^{n-i}w_{i}}}\right)=c_{*}$ is independent of $f, b$ .

In summary, the accuracy of the iterator is independent with f and b. Thus, if the iterator is accuracy for one $f_{0}$ and $b_{0}$ , then it is accuracy for any choice of f and b.□

Theorem 3 states that our iterator can be freely generalized to different f and b. There is no guarantee that our iterator can generalize to different G and n. Generalization to different G and n has to be empirically verified: in our experiments, our learned iterator converges to the correct solution for a variety of grid sizes n and geometries G, and we have not observed that any $G, n$ make our iterators non convergent.

Even if some $G, n$ makes our iterator generalization fail, there is no risk of obtaining incorrect results. The iterator will simply fail to converge. This is because according to Theorem 2, fixed points of our new iterator is the same as the fixed point of hand designed iterator $\varphi $ . Therefore if our iterator is convergent, it is accuracy.

SECTION IV.

Experiments

A. Experimental Setting

For fair comparison, we follow [34] to prepare our experimental setting, including data set, evaluation criterion and the implementation of H.

1) Data Set

To reemphasize, our goal is to train a model on simple domains where the ground truth solutions can be easily obtained, and then evaluate its performance on more complex geometries and boundary conditions. For training, we select the simplest homogeneous equation on a 2-dimensional square domain ($k=2$ ) with boundary conditions such that each side is a random fixed value. In this case, the equation can be solved quickly by Jacobi method, so we can obtain a large number of training data with low computational cost. This setting is also used in [34], [44], and [46].

For testing, we use larger grid sizes n than training. For example, choose $256 \times 256$ grid sizes to test our model, which is trained on $64\times 64$ grids. Moreover, we design some challenging geometries to test the generalization of our models: (i) same geometry but larger grid, (ii) L-shape geometry, (iii) Cylinders geometry, and (iv) PDEs in the same geometry, but $f\neq 0$ . L-shape and cylinders geometries are designed because the models are trained on square domains and have never seen sharp or curved boundaries. Examples of the four settings are shown in Appendix IV-E.

2) Implementation of $H$

We use convolution to realize A. Similarly, we use the stack of convolutional layers or U-Net [47] with linear operation only to implement H, and different implementation methods use different data sets. For the implementation of convolutional layer stacking, we call it Unfixed-Bias-ConvX model. The model is trained on the square domain with $16 \times 16$ , and tested on $64\times 64$ grid sizes.

For the implementation of U-Net, we call it Unfixed-Bias-U-NetX model. The model is trained on the square domain with $64 \times 64$ , and tested on $256\times 256$ grid sizes.

3) Evaluation Criterion

For Unfixed-Bias-Conv models, we compare them against Jacobi method. On GPU, the Jacobi iterator and our model can both be efficiently implemented as convolution layers. Thus, we measure the computation cost by the number of convolutional layers. Suppose Jacobi method converges after N iterations, and our model converges after M iterations. Then the evaluation criteria layers is calculated as follows:\begin{equation*} layers = \frac {M(1+L)}{N}, \tag {18}\end{equation*} View Sourcewhere L represents the number of convolution layers stacked in H.

On CPU, the Jacobi iteration requires 4 multiply-add operations, while a $3\times 3$ convolutional kernel requires 9 operations, so we measure the computation cost by the number of multiply-add operations. Since our iterator has 2 additional multiply-add operation when calculating $z_{i+1}$ , the evaluation criteria ops is calculated as follows:\begin{equation*} ops = \frac {M(4+2+9L)}{4N}. \tag {19}\end{equation*} View Source

For Unfixed-Bias-U-Net models, we compare them against Multigrid method with the same number of sub-sampling and smoothing layers. Therefore, our models have the same number of convolutional layers, and roughly $9/4$ times the number of operations compared to Multigrid.

B. Numerical Experiments

We have validated the effectiveness of our iterator in solving different PDEs.

1) Possion Equation

The Possion Equation is written as:\begin{equation*} \nabla ^{2}{\mathscr {u}} = {\mathscr {f}} \tag {20}\end{equation*} View Sourcewhere $\nabla ^{2}$ is the Laplace operator. Table 1 shows results of our iterator in solving Poisson Equation. For Unfixed-Bias-ConvX models, they converge to the correct solution, and require less computation than Jacobi in all settings. Our Unfixed-Bias-Conv2 is the best model, which achieves $5.7\sim 7.0\times $ faster than Jacobi in terms of layers, and $2.9\sim 3.5\times $ faster in terms of multiply-add operations. In addition, it is $26\%\sim 53\%$ faster than Conv3 [34].

TABLE 1 Results in Solving Possion Equation

For Unfixed-Bias-U-NetX models, they also converge to the correct solution, and require less computation than Multigrid in all settings. Our Unfixed-Bias-U-Net3 is $66.7\sim 71.4\times $ faster than Multigrid in terms of layers, and $30.3\sim 32.3\times $ faster in terms of multiply-add operations. In addition, it is $464\%\sim 507\%$ faster than U-Net2 [34].

According to Theorem 1, if our iterator converges for a geometry, then it is guaranteed to converge to the correct solution for any f and boundary values b. The experiment results show that our model not only converges but also converges faster than the standard solver in a variety of grid sizes n and geometries G. Empirically, this also shows that our method has strong generalization.

2) Helmholtz Equation

The Helmholtz Equation is written as:\begin{equation*} \nabla ^{2}{\mathscr {u}} + a^{2} {\mathscr {u}}={\mathscr {f}} \tag {21}\end{equation*} View Sourcewhere a is a constant. Table 2 shows results of our iterator in solving Helmholtz Equation. By changing the value of a, we verify three different Helmholtz Equation. In all cases, our Unfixed-Bias-ConvX models can converge and are better than Jacobi method. Unfixed-Bias-Conv2 achieves $5.5\sim 7.2\times $ faster than Jacobi in terms of layers, and $2.8\sim 3.6\times $ faster in terms of multiply-add operations. In addition, our Unfixed-Bias-Conv1 about ~ 2.5 times faster than Conv1 [34]. Helmholtz Equation is much more difficult to solve than Poisson Equation, but our models still obtain the performance of far more than manual design methods in this challenging equation, which greatly reflects the superiority of our iterators.

TABLE 2 Results in Solving Helmholtz Equation

3) Heat Conduction Equation

The Heat Conduction Equation is written as:\begin{equation*} \frac {\partial {\mathscr {u}}}{\partial t} - a\nabla ^{2}{\mathscr {u}} = {\mathscr {f}}. \tag {22}\end{equation*} View SourceTable 3 shows results of our iterator in solving Heat Conduction Equation. The Heat Conduction Equation combines the first-order and second-order partial derivatives, which further improves the difficulty of solving. Our Unfixed-Bias-ConvX models can still converge, and both obtain faster convergence speed than the manually designed iterator. Our Unfixed-Bias-Conv2 is still the best model, which achieves $4.6\sim 5.6\times $ faster than Jacobi in terms of layers, and $2.3\sim 2.8\times $ faster in terms of multiply-add operations.

TABLE 3 Results in Solving Heat Conduction Equation

In essence, using Poisson Equation, Helmholtz Equation and Heat Conduction Equation to verify the effectiveness of our method is changing A. Although we need to retrain a model for different A, it also shows that our method can be easily extended to different PDEs, and can obtain an ideal performance. In the future, we will consider training a model that can be generalized to different A.

C. The Sensitivity of Hyper-Parameters

In this section, we discuss the influence of the hyper-parameter $\theta $ and the number of iterations k of the model.

1) Hyper-Parameter $\theta $

Taking models Unfixed-Bias-Conv2 and Unfixed-Bias-U-Net3 as examples, we test their sensitivity to $\theta $ . The results are shown in Figure 1. When $\theta $ is between 0.5 and 0.8, there is no obvious fluctuation in the convergence speed of the models. When $\theta $ is less than 0.5, the convergence speed of the models is greatly reduced. Because $\theta $ is too small, the correction value is too dependent on the historical correction value. Although the correction value is relatively stable, the update is not real-time enough, resulting in a decline of the accuracy of the correction value. When $\theta $ is greater than 0.8, the convergence speed of the model also decreases significantly. The main reason is that when $\theta $ is too large, the correction value mainly refers to the current correction value. Although the update of the correction value is timely, the stability is greatly reduced, and finally, the accuracy of the correction value is reduced. In our experiments, $\theta =0.8$ is a suitable value for all models.

$FIGURE 1. - Sensitivity of the models to $\theta $ .$

FIGURE 1.

Sensitivity of the models to $\theta $ .

Show All

2) Number of Iterations $l$

When we change the value range of iteration number k, no obvious change in the convergence speed of the model is observed. Only in some extreme settings, for example, $l\in [{1,2}]$ . This setting requires that the model can converge with a minimum number of iterations, but it is obviously impossible, which leads to model training collapse. However, although the models are not sensitive to the value range of l, the range of smaller value can reduce the training time of the models. Therefore, $l\in [{1,20}]$ is a recommended setting.

D. Errors

Figure 2 shows that the errors decrease with iteration under different iterators. Obviously, our iterator can reduce high-frequency errors at a very fast speed. After reducing the errors to a certain extent, the speed of the Jacobi iterator to reduce the low-frequency errors is much lower than our iterator. From the speed of error reduction, we can see why our method can achieve much faster convergence than the Jacobi iterator, which further explains the superiority of our iterator.

FIGURE 2.

Errors under different iteration times.

Show All

E. Visualization

We design some challenging geometries to test the generalization of our models: (i) same geometry but larger grid, (ii) L-shape geometry, (iii) Cylinders geometry, and (iv) PDEs in the same geometry, but $f\neq 0$ . L-shape and cylinders geometries are designed because the models are trained on square domains and have never seen sharp or curved boundaries. Examples of the four settings are shown in Figure 3.

FIGURE 3.

The ground truth solutions of examples in different settings.

Show All

SECTION V.

Conclusion

We build a learned iterator on top of an existing standard iterative solver, which can modify the current iteration result with the help of the historical iteration results, so as to accelerate the convergence speed. At the theoretical level, due to the introduction of the historical iterative results, our iterator is a new iterative format: Unfixed Bias Iterator: the unfixed bias iterator, and we provide sufficient theoretical guarantees for it, after which we prove theoretically that our iterator can converge and has some generalization. Experimental results show that even if our solver is trained only in simple domains, it can generalize to different grid sizes, geometries and boundary conditions. Moreover, the powerful generalization of our method is illustrated by solving for different PDEs. Last but not least, our iterator greatly exceeds the existing iterators in convergence speed.

Appendix
Lemmas and Proofs

This chapter gives the lemmas we use and their proofs.

Lemma 1:

$\lim _{n\to \infty }a_{n}=0$ , if $\lim _{n\to \infty }\sum _{i=0}^{n}B^{n-i}a_{i}=0$ , where B is a matrix and $a_{i}$ is a vector.

Proof:

Since $\lim _{n\to \infty }\sum _{i=0}^{n}B^{n-i}a_{i}=0$ , we know that:\begin{align*} \lim _{n\to \infty }\sum _{i=0}^{n}B^{n-i}a_{i} & = \lim _{n\to \infty }\left ({{\sum _{i=0}^{n-1}B^{n-i}a_{i} + a_{n}}}\right ) = 0 \\ & \Rightarrow \lim _{n\to \infty }\sum _{i=0}^{n-1}B^{n-i}a_{i} = -\lim _{n\to \infty }a_{n} \tag {23}\end{align*} View SourceWhen n is sufficiently large, we have that:\begin{align*} a_{n} & = -\sum _{i=0}^{n-1}B^{n-i}a_{i}, \tag {24}\\ a_{n+1} & = -\sum _{i=0}^{n}B^{n+1-i}a_{i} \\ & = -\sum _{i=0}^{n-1}B^{n+1-i}a_{i} - Ba_{n} \\ & = -\sum _{i=0}^{n-1}B^{n+1-i}a_{i} + B\sum _{i=0}^{n-1}B^{n-i}a_{i} \\ & = -\sum _{i=0}^{n-1}B^{n+1-i}a_{i} + \sum _{i=0}^{n-1}B^{n+1-i}a_{i} = 0 \\ & \Rightarrow \lim _{n\to \infty }a_{n}=0. \tag {25}\end{align*} View Source□

Lemma 2:

$\lim _{n\to \infty }\sum _{i=0}^{n}B^{n-i}a_{i}=0$ , if the spectral radius $\rho (B)\lt 1$ and $\lim _{n\to \infty }a_{n}=0$ , where B is a matrix and $a_{i}$ is a vector.

Proof:

Let $\epsilon $ . Then since $\lim _{n\to \infty }a_{n}=0$ there is an $N_{1}\gt 0$ such that if $m\gt N_{1}$ we know that:\begin{equation*} ||a_{m}||\lt \epsilon. \tag {26}\end{equation*} View SourceBy dividing $\sum _{i=0}^{n}B^{n-i}a_{i}$ into two parts with $N_{1}$ as the boundary, we can obtain:\begin{equation*} ||\sum _{i=0}^{n}B^{n-i}a_{i}|| \leq ||\sum _{i=0}^{N_{1}}B^{n-i}a_{i}|| + ||\sum _{i=N_{1}}^{n}B^{n-i}a_{i}||. \tag {27}\end{equation*} View SourceAccording to Eq. 26, we have that:\begin{align*} ||\sum _{i=N_{1}}^{n}B^{n-i}a_{i}|| & \lt \epsilon ||\sum _{i=N_{1}}^{n}B^{n-i}|| = \epsilon ||\sum _{i=0}^{n-N_{1}-1}B^{i}|| \\ & = \epsilon ||(I-B)^{-1}(I-B^{n-N_{1}})|| \\ & \leq \epsilon ||(I-B)^{-1}|| \tag {28}\end{align*} View SourceIf $||a_{*}||\geq ||a_{i}||$ for all $i\leq N_{1}$ , then we have that:\begin{equation*} ||\sum _{i=0}^{N_{1}}B^{n-i}a_{i}|| \leq ||\sum _{i=0}^{N_{1}}B^{n-i}a_{*}|| \lt N_{1}||B^{n-N_{1}}a_{*}||. \tag {29}\end{equation*} View SourceObviously, there is an $N_{2}\gt N_{1}$ such that if $m\gt N_{2}$ we know that:\begin{equation*} N_{1}||B^{m-N_{1}}a_{*}|| \lt \epsilon. \tag {30}\end{equation*} View SourceTo sum up, if we take $N=N_{2}$ , then for all $m \gt N$ , we have that:\begin{equation*} |\sum _{i=0}^{m}B^{m-i}a_{i}| \lt \epsilon (||(I-B)^{-1}|| + 1). \tag {31}\end{equation*} View SourceTherefore, $\lim _{n\to \infty }\sum _{i=0}^{n}B^{n-i}a_{i}=0$ .□

Lemma 3:

$\lim _{n\to \infty }\sum _{i=0}^{n}\beta ^{n-i}\alpha _{i}=0$ , if $0\lt \beta \lt 1$ and $\lim _{n\to \infty }\alpha _{n}=0$ .

Proof:

Let $\epsilon \gt 0$ . Then since $\lim _{n\to \infty }\alpha _{n}=0$ there is an $N_{1}\gt 0$ such that if $m\gt N_{1}$ we know that:\begin{equation*} |\alpha _{m}|\lt \epsilon. \tag {32}\end{equation*} View SourceBy dividing $\sum _{i=0}^{n}\beta ^{n-i}\alpha _{i}$ into two parts with $N_{1}$ as the boundary, we can obtain:\begin{equation*} |\sum _{i=0}^{n}\beta ^{n-i}\alpha _{i}| \leq |\sum _{i=0}^{N_{1}}\beta ^{n-i}\alpha _{i}| + |\sum _{i=N_{1}}^{n}\beta ^{n-i}\alpha _{i}|. \tag {33}\end{equation*} View SourceAccording to Eq. 32, we have that:\begin{align*} |\sum _{i=N_{1}}^{n}\beta ^{n-i}\alpha _{i}| & \lt \epsilon |\sum _{i=N_{1}}^{n}\beta ^{n-i}| = \epsilon |\sum _{i=0}^{n-N_{1}-1}\beta ^{i}| \\ & = \epsilon \frac {1-\beta ^{n-N_{1}-1}}{1-\beta } \lt \frac {\epsilon }{1-\beta }. \tag {34}\end{align*} View SourceIf we take $\alpha _{*}=\max _{0\leq i\leq N_{1}}\alpha _{i}$ , we have that:\begin{equation*} |\sum _{i=0}^{N_{1}}\beta ^{n-i}\alpha _{i}| \leq |\alpha _{*}\sum _{i=0}^{N_{1}}\beta ^{n-i}| \lt |\alpha _{*}N_{1}\beta ^{n-N_{1}}|. \tag {35}\end{equation*} View SourceObviously, there is an $N_{2}\gt N_{1}$ such that if $m\gt N_{2}$ we know that:\begin{equation*} |\alpha _{*}N_{1}\beta ^{m-N_{1}}| \lt \epsilon. \tag {36}\end{equation*} View SourceTo sum up, if we take $N=N_{2}$ , then for all $m \gt N$ , we have that:\begin{equation*} |\sum _{i=0}^{m}\beta ^{m-i}\alpha _{i}| \lt \frac {\epsilon }{1-\beta } + \epsilon. \tag {37}\end{equation*} View SourceTherefore, $\lim _{n\to \infty }\sum _{i=0}^{n}\beta ^{n-i}\alpha _{i}=0$ .□

Lemma 4:

$\lim _{n\to \infty }\sum _{i=0}^{n}\beta ^{n-i}a_{i}=0$ , if $0\lt \beta \lt 1$ , $\lim _{n\to \infty }a_{n}=0$ , where $a_{i}$ is a vector.

Proof:

Since $a_{i}$ is a vector in linear space $\mathbb {R}^{k}$ , where:\begin{equation*} a_{i} = (a_{i}^{(1)}, a_{i}^{(2)}, \cdots, a_{i}^{(k)}). \tag {38}\end{equation*} View SourceThen $\sum _{i=0}^{n}\beta ^{n-i}a_{i}$ can be rewritten as:\begin{equation*} \left ({{\sum _{i=0}^{n}\beta ^{n-i}a_{i}^{(1)}, \sum _{i=0}^{n}\beta ^{n-i}a_{i}^{(2)}, \cdots, \sum _{i=0}^{n}\beta ^{n-i}a_{i}^{(k)}}}\right ) \tag {39}\end{equation*} View SourceAccording to Lemma 3, $\lim _{n\to \infty }\sum _{i=0}^{n}\beta ^{n-i}a_{i}^{(j)}=0$ holds for any j.

Therefore, $\lim _{n\to \infty }\sum _{i=0}^{n}\beta ^{n-i}a_{i}=0$ .□

References is not available for this document.

Unfixed Bias Iterator: A New Iterative Format

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Work

A. End-to-End Methods

B. Iterative Methods

Methodology

A. Notations

B. Iterator and Training

C. Theoretical Analysis

1) Unfixed Bias Iterator

Definition 1 (Unfixed Bias Iterator):

Theorem 1:

Proof:

2) Accuracy

Theorem 2:

Proof:

3) Generalization

Theorem 3:

Proof:

Experiments

A. Experimental Setting

1) Data Set

2) Implementation of $H$

3) Evaluation Criterion

B. Numerical Experiments

1) Possion Equation

2) Helmholtz Equation

3) Heat Conduction Equation

C. The Sensitivity of Hyper-Parameters

1) Hyper-Parameter $\theta $

2) Number of Iterations $l$

D. Errors

E. Visualization

Conclusion

AppendixLemmas and Proofs

Lemmas and Proofs

Lemma 1:

Proof:

Lemma 2:

Proof:

Lemma 3:

Proof:

Lemma 4:

Proof:

Authors

Figures

References

Keywords

Metrics

References

Appendix
Lemmas and Proofs