Introduction

Since the invention of calculus and Newton’s Principia, ODEs and PDEs (Ordinary / Partial Differential Equations) have been the most used class of models for natural phenomena. The laws of Newton and Kepler are ODEs, fluid mechanics and electromagnetism are modeled using PDEs. So are elasticity, plasticity, thermodynamics, quantum mechanics, or climate models. The examples are innumerable and can be found across all science and engineering, and they have driven the development and construction of ever more powerful supercomputers. The efficient solution of DEs is key to the advancement of science and its successful application in engineering.

The two problems

The application of neural networks to DEs has a long history: long after seminal works like [Dis94N] and [Lag98A, Lag00N] interest has currently rekindled, motivated by the successes of NNs in computer vision, natural language processing and other fields. We can distinguish two main tasks to solve:

In the direct problem, given a PDE $L(u)=f$ with known coefficients, domain and initial and boundary conditions, one wishes to compute or approximate a solution. The simplest approach, which we consider below, is to directly minimise the MSE $||L(\hat u) - f||_2^2$, with some additional terms.

In the inverse problem, a parametrised PDE is given, and some values of the solution are observed. The task (system identification), is then to infer the parameters of the equation governing the system [Rai18H, Rai19P]. A related line of research learns the structure of the equations instead of just the coefficients. Some do a form of symbolic regression defined by activation functions [Sah18L]. [Lon18P] shows how differential operators are approximated by convolution filters and learns those. Some techniques are meta-heuristic, others use gradient-based approaches [Bru16D, Qin19D].

In this note, we focus on the direct problem and a family of methods tackling it.

The curse of dimensionality

The dimension of the solution for most DEs grows quickly with the size of the system. For example, in ab-initio molecular dynamics, every atom is represented as a point described by 6 numbers (3 position coordinates, 3 velocity coordinates). Consequently, the dimension of the system grows like $6N$. In this field, the simulation of a small drop of water is a traditional benchmark since it acts as solvent in many systems, and one can quickly verify how such a “simple problem” can become untractable with hexabyte storage and other impossible requirements.1 1 To comprehend the extent of the problem, consider the following back-of-the-napkin calculation. The molar mass of water is 18 g/mol, so one drop contains around $5 \cdot 10^{-4}$ mol of water, or $3.34 \cdot 10^{20}$ molecules. Each molecule is made of 3 atoms, 2H and O, so there will be $N = 10^{21}$ atoms. Making the optimistic assumption that position and velocity vectors of each atom are enough to describe the system (disregarding interactions, among other things), we need $6 \cdot N$ numbers, which, using 32-bit precision, roughly amounts to $2.4 \cdot 10^{22}$ bytes. That is, 21 billion terabytes are needed in order to store a snapshot of the system. Of course, there are macro descriptions of the behaviour of water but that is besides the point. For one thing, when using ab-initio techniques, some of the properties of water ( hydrogen bonds, capillarity, etc) emerge naturally. The important point, however, is that the number of molecules is huge. Its lower range is in the thousands and its higher range around Avogadro’s number ($10^{23}$). Furthermore, most molecules of interest have many more atoms than water, e.g. penicillin is made of 41 atoms and proteins can have thousands of them.

As an example, the figure below shows how the dimension of the solution of Schrödinger’s equation grows with the size of the molecule being simulated.

Dimension scaling of Schrödinger's equation
This plot is for just one molecule of each compound. Most simulations of practical relevance will have billions of molecules.

This is further aggravated by the grids which methods like FEM or Finite Differences (FD) use. Due to the curse of dimensionality, the number of grid points also grows exponentially. In some cases, solution quality can be traded for computational requirements by using coarse or adaptive grids, but even that tradeoff has limits. Some systems just grow too fast.

One advantage of approximating the solutions as a NN instead of as values on a grid is that we can dispense with the grid altogether. This, at least in theory, could open the door to dealing with high-dimensional problems that were previously intractable.

Solving the direct problem

There are a number of approaches to incorporating NNs in the solution procedures of DEs. [Dea18E] introduce an end-to-end differential physics simulator that can be used to allow full back-propagation and improve the performance of algorithms that use it. [Um21S] solve equations using traditional, grid-based solvers on a coarse grid and use a NN to add a fine correction term to the coarse solution. As specifying the grid upon which to solve the equation is a non-trivial task, [Bar19L] learn it instead. [Rai18H] uses not neural networks but a Gaussian Process prior on the solutions. [Lad15D] use regression forests to simulate fluids and [Li20F] attempt to learn the solution operator.

We will examine Physically Inspired Neural Networks (PINNs) [Rai19P] and Deep Galerkin Methods (DGM) [Sir18D] in detail, a family of methods that instead of using linear combinations of basis vectors like in FEM, represent the solution as a NN. This has been an active area of research recently [Han18S, Kha19V, Shi20C, Lu21L]. In order to find the best approximation to the solution, one has to solve a non-linear optimization problem. Recall that a solution to a PDE is a function that fulfills all the equation’s conditions, i.e. the equation proper, the boundary conditions, and the initial conditions. A loss function that measures how much the NN violates the equation constraints can be specified and then minimized, as we can numerically evaluate the differential operator applied to the network. If we add initial and boundary conditions, then achieving a zero loss means obtaining one solution. Of course, in practice zero is never perfectly attained, but a network with low enough loss can be a satisfactory solution.

Avoiding the curse of dimensionality: the PINN / DGM method

Even though it has appeared independently under two names in the literature, PINN and DGM, the underlying method is essentially the same. The main component of PINN [Rai19P] and DGM [Sir18D] is an appropriately crafted loss function that measures how far the network is from the solution to the problem. All the information we need is contained in the equation itself, and the additional conditions that a solution has to fulfill. We just need to measure how far the current network is from fulfilling them. Therefore, we construct a composite loss

$$L = L_s +L_b + L_i$$

where the subscript $s$ stands for structural, $b$ for boundary and $i$ for initial. Let us use the heat equation, a canonical example, to illustrate our definitions. Let $\Omega \subset \mathbb{R}$ be an open, bounded subinterval of the real line and let $t \in \mathbb{R}$ denote time. The 1-dimensional heat equation is given by

$$\Delta u (t, x) = \frac{\partial u(t, x)}{\partial t}, t \in \mathbb{R}, x \in \Omega.$$

Now, each of the sub-losses is defined as:

$$\displaylines{L_{structural} = \frac{1}{N_{r}} \sum \left(\Delta u - \frac{\partial u}{\partial t}\right)^2 \\ L_{boundary} = \frac{1}{N_{b}} \sum (u - g)^2 \\ L_ {initial} =\frac{1}{N_{i}} \sum (u - u_0)^2}$$

In the above, $g(t, x)$ is a given boundary condition, defined for all $t\in\mathbb{R}$ and $x \in \partial \Omega$, the boundary of $\Omega$, and $u_0(x)$ is the initial condition, given over all of $\Omega$.

Note that we measure each loss only within its domain. That is, the initial loss only over points with $t = 0$, the boundary loss only on the boundary (i.e. $ x \in \partial \Omega$) and the structural loss on the rest of the domain. The points can be sampled in multiple ways. Both fixed and adaptive grids could be used, but the main advantage of these methods is that the domain can be randomly (and potentially sparsely) sampled. That means that we can do without a grid that becomes unmanageable in high dimensional domains. Another advantage is that for some problems and domains, the computation of the grid itself can be a major issue.

The fact that backpropagation is at the core of the training procedures of NNs has the positive side effect that modern NN libraries implement very efficient automatic differentiation algorithms, which, contrary to numerical differentiation does not incur in inherent numerical errors.2 2 For example, the time derivative in the heat equation, $\frac{\partial u}{\partial t}$ can be implemented as follows: (syntax might change for your version) u_t = torch.autograd.grad(u, t, create_graph = True) in pytorch, or u_t = tf.gradients(u, t, unconnected_gradients = "zero") in tensorflow. Finally, we need only minimize the defined loss using a modern optimization algorithm, like Adam, and we are done. But, are we?

Gridless and high dimensional

As mentioned above, the main appeal of approximating the solution with a NN instead of a grid is that the exponential growth of needed grid points (curse of dimensionality) can be avoided. The most straightforward, and often sufficient way to do this is to uniformly sample across the domain, and this is what most papers do. There are, however, situations where this approach is not enough. A higher probability of sampling might be required, for example, in regions where the dynamics are complicated (in space) or where the system undergoes qualitative changes. How to find those regions is a whole research domain, one already well studied in adaptive grid methods.

A promising evolution of methods above has shown the potential to solve very high dimensional equations (200+) in [Han18S, Al-18S, Rai18F, Zha20F, E17D]. These papers rely on a reformulation of the PDE in terms of an SDE (Stochastic Differential Equation). The reformulation itself is a very active research area [Par90A, Che07S] and looks extremely promising as of now, as it allows to cheaply solve equations that were not (cheaply) solvable before. Sadly, not all equations can be rewritten in the necessary form, and among those that cannot are some of the most used ones.

Why it probably won’t work for you (right now)

So should you forget about FeNIcs, Abaqus, Comsol and other well-established tools and embrace NNs as the answer to all your problems? Not quite. These solvers are hard to set up and quite brittle. After all, you are training a NN and all the usual black magic still applies. Making sure everything is wired correctly is especially tricky because of the differential operators. This means an expensive expert will have to spend more of their time debugging the model than they would on a non-NN method.

Solving the equation means that the optimizer has to converge to a good minimum. Ideally, the global minimum, which we hopefully know to exist, but a sufficiently good local minimum might also be fine. The definition of “sufficiently good” will depend on the context. The biggest problem these methods currently have is that achieving convergence is rather difficult, even when the model is properly set up. Why? Because the gradient descent dynamics induced by the loss function are radically different from the one arising from losses in more traditional domains, such as Computer Vision or NLP, and most modern optimizers are effective within the meta-class of traditional problems [Wan20W, Fuk20L, Wan20U]. This problem can be further aggravated by complicated dynamic evolution, such as turbulent flow or shockwaves [Fuk20L]. We have found an instance of such pathological loss gradients when attempting to apply the method to the Schrödinger Equation. Absolute value of a solution (d orbital) for the radial 2D Schrödinger’s equation obtained using PINN. The zero function (a function that is zero everywhere) is a strong attractor, even though it is not a solution, and the network tends to converge towards it. As the zero solution fulfills the constraint of the equation proper, it minimizes the structural loss. The magnitude of the gradient of the structural loss is greater than those of the initial and boundary losses, meaning it will dominate the gradient descent procedure, especially as one tries to compute the solution farther along in time (thus enlarging the time x space cylinder).

Less of an issue in low-dimensional systems, the above holds especially true for high dimensional equations (3D+) with complicated dynamics. Achieving convergence in low-dimensional systems is easier as they are less sensitive to hyperparameters and pathological gradient dynamics. Still, the method remains slow and expensive to run. Readily available frameworks such as FEniCS will compute a solution to the 1D Burgers equation on a laptop in a few seconds while NN-based methods will require, on a powerful (and expensive) GPU, a minute or more. Potential optimizations like reduced precision, smarter domain sampling, or better architectures do not seem to fundamentally change this.

Additionally, computing the Hessian (two backpropagations) and then using it in the loss function (another one) is memory intensive. This is another tradeoff, where we replace the issue with grid points by the need to compute expensive higher order derivatives. We have found in our experiments with Schrödinger’s equation that memory requirements grow linearly for these computations, as shown in the image below (keep in mind that 32 dimensions is still less than a water molecule) and [Gro18P] found that the number of parameters needed by the NN grows at most at a polynomial rate in the dimension. These are good news, but in practice if the system has around 10~20 dimensions and second order derivatives, in FP32 it will use all the memory of a Tesla V100 (32GB). This makes necessary the search for more memory efficient methods.

Finally, although theoretical guarantees are being studied [Shi20C, Mar21P], when training a NN-based solver the practitioner is in the dark as to the quality of the solution. Without prior knowledge about it, which is the very thing one is trying to obtain, or at least a very good idea of what it should look like, a great measure of trial and error is required and in the end there is no guarantee (of course this is true of many non-convex problems and other methods might have the same issue). The same loss value can correspond to either a good or bad solution depending on the problem at hand, and the loss curves can behave in unexpected ways (in light of standard NN know-how). At the end of the day, this means that an expert needs to work on the problem for a while to become sufficiently familiar with it in order to obtain a satisfactory solution and that the assessment of the quality of the solution will probably be based on heuristics. This is in stark contrast to most classical methods, which come with theoretical guarantees and don’t rely on heuristics as NNs often do, and which also benefit from decades of development [Qua94N] and highly optimized implementations. This is particularly obvious for linear problems, but one would of course not try to use NNs for those except for benchmarking purposes or to overcome the problem of high dimensions.

Conclusion

Although the problems of complicated gradient dynamics and the failure to converge in harder cases are being addressed by some researchers, and the success of approaches based on the SDE approach is certainly impressive, for most practical applications to implement such a solution system will be expensive and time-consuming. This application domain of NN suffers from the usual difficulties in obtaining convergence and does not yet have an established “conventional wisdom” to guide the practitioner.

In a nutshell: Using NNs to solve PDEs can require more resources and time than traditional methods, lacks theoretical guarantees, it is often hard to obtain a solution and to validate it. If your problem is low dimensional and not too complex do not waste your time. For now, NNs will probably only offer an advantage as a solver for high-dimensional systems that can be reformulated as a forward-backward SDE.

The future

A more thorough study of how the performance of the method scales to higher dimensions, in terms of convergence rate, convergence time, and memory requirement is needed. Specialized optimizers that can handle the particular loss dynamics also need to be researched, better memory scaling achieved, and convergence time reduced before these methods can be used independently as plug and play in the real world and compete with the well established solvers. Also, as the field matures, the lacking “conventional wisdom” will hopefully emerge.

PINNs, DGM, and other NNs based approaches to DE solving suffer from problems that are unique in the sense that they don’t appear when dealing with traditional solvers or when using NNs in other domains. While those problems are deal-breaking in most cases for now, a lot of research is going into solving them and making NNs based approaches a mature technology. Arguably, the most pressing issue for a practitioner is the lack of established libraries that can be used out of the box. This is being addressed by [Lu21L] in DeepXDE.

Further, efforts are being made to using known reformulations of DEs in the hope of finding a form of the problem that is easier to minimize using the existing tools. [E18D], use the Ritz method, that is, they reformulate the DE problem as an energy minimization one. This new loss function has the benefit that it can be minimized directly, without the need of incorporating it into any additional MSE. With some luck, the new loss will have gradient descent dynamics that are less problematic than those of the composite loss described above. Finally, [Yan18P] combine PINNs and GANs but these come with their own plethora of optimization issues.

References

[Al-18S]

Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning, Ali Al-Aradi, Adolfo Correia, Danilo Naiff, Gabriel Jardim, Yuri Saporito.

Nov 2018

In this work we apply the Deep Galerkin Method (DGM) described in Sirignano and Spiliopoulos (2018) to solve a number of partial differential equations that arise in quantitative finance applications including option pricing, optimal execution, mean field games, etc. The main idea behind DGM is to represent the unknown function of interest using a deep neural network. A key feature of this …

[Bar19L]

Learning data-driven discretizations for partial differential equations, Yohai Bar-Sinai, Stephan Hoyer, Jason Hickey, Michael P. Brenner.

Jul 2019

The numerical solution of partial differential equations (PDEs) is challenging because of the need to resolve spatiotemporal features over wide length- and timescales. Often, it is computationally intractable to resolve the finest features in the solution. The only recourse is to use approximate coarse-grained representations, which aim to accurately represent long-wavelength dynamics while …

[Ber18U]

A unified deep artificial neural network approach to partial differential equations in complex geometries, Jens Berg, Kaj Nyström.

Nov 2018

In this paper, we use deep feedforward artificial neural networks to approximate solutions to partial differential equations in complex geometries. We show how to modify the backpropagation algorithm to compute the partial derivatives of the network output with respect to the space variables which is needed to approximate the differential operator. The method is based on an ansatz for the solution …

[Bru16D]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Steven L. Brunton, Joshua L. Proctor, J. Nathan Kutz.

Apr 2016

Extracting governing equations from data is a central challenge in many diverse areas of science and engineering. Data are abundant whereas models often remain elusive, as in climate science, neuroscience, ecology, finance, and epidemiology, to name only a few examples. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover …

[Che07S]

Second-order backward stochastic differential equations and fully nonlinear parabolic PDEs, Patrick Cheridito, H. Mete Soner, Nizar Touzi, Nicolas Victoir.

2007

For a d-dimensional diffusion of the form dXt = μ(Xt)dt + σ(Xt)dWt and continuous functions f and g, we study the existence and uniqueness of adapted processes Y, Z, Γ, and A solving the second-order backward stochastic differential equation (2BSDE) $$dY_t = f(t,X_t, Y_t, Z_t, \Gamma_t) dt + Z_t'\circ dX_t, \quad t ın [0,T),$$ $$dZ_t = A_t dt + \Gamma_tdX_t, \quad t ın [0,T),$$ $$Y_T = g(X_T).$$ …

[Dea18E]

End-to-End Differentiable Physics for Learning and Control, Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, J. Zico Kolter.

2018

[Dis94N]

Neural-network-based approximations for solving partial differential equations, M. W. M. G. Dissanayake, N. Phan‐Thien.

1994

A numerical method, based on neural-network-based functions, for solving partial differential equations is reported in the paper. Using a ‘universal approximator’ based on a neural network and point collocation, the numerical problem of solving the partial differential equation is transformed to an unconstrained minimization problem. The method is extremely easy to implement and is suitable for …

[E17D]

Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations, Weinan E, Jiequn Han, Arnulf Jentzen.

Dec 2017

We study a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, which is based on an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the …

[E18D]

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, Weinan E, Bing Yu.

Mar 2018

We propose a deep learning based method, the Deep Ritz Method, for numerically solving variational problems, particularly the ones that arise from partial differential equations. The Deep Ritz method is naturally nonlinear, naturally adaptive and has the potential to work in rather high dimensions. The framework is quite simple and fits well with the stochastic gradient descent method used in deep …

[Fuk20L]

Limitations of physics informed machine learning for nonlinear two-phase transport inn porous media, Olga Fuks, Hamdi A. Tchelepi.

2020

[Gro18P]

A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations, Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philippe von Wurstemberger.

Sep 2018

Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the …

[Han18S]

Solving high-dimensional partial differential equations using deep learning, Jiequn Han, Arnulf Jentzen, Weinan E.

Aug 2018

Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality". This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic …

[Kha19V]

Variational Physics-Informed Neural Networks For Solving Partial Differential Equations, E. Kharazmi, Z. Zhang, G. E. Karniadakis.

Nov 2019

Physics-informed neural networks (PINNs) use automatic differentiation to solve partial differential equations (PDEs) by penalizing the PDE in the loss function at a random set of points in the domain of interest. Here, we develop a Petrov-Galerkin version of PINNs based on the nonlinear approximation of deep neural networks (DNNs) by selecting the *trial space* to be the space of neural networks …

[Lad15D]

Data-driven fluid simulations using regression forests, L'ubor Ladický, SoHyeon Jeong, Barbara Solenthaler, Marc Pollefeys, Markus Gross.

Oct 2015

Traditional fluid simulations require large computational resources even for an average sized scene with the main bottleneck being a very small time step size, required to guarantee the stability of the solution. Despite a large progress in parallel computing and efficient algorithms for pressure computation in the recent years, realtime fluid simulations have been possible only under very …

[Lag98A]

Artificial neural networks for solving ordinary and partial differential equations, I. E. Lagaris, A. Likas, D. I. Fotiadis.

Sep 1998

We present a method to solve initial and boundary value problems using artificial neural networks. A trial solution of the differential equation is written as a sum of two parts. The first part satisfies the initial/boundary conditions and contains no adjustable parameters. The second part is constructed so as not to affect the initial/boundary conditions. This part involves a feedforward neural …

[Lag00N]

Neural-network methods for boundary value problems with irregular boundaries, I. E. Lagaris, A. C. Likas, D. G. Papageorgiou.

Sep 2000

Partial differential equations (PDEs) with boundary conditions (Dirichlet or Neumann) defined on boundaries with simple geometry have been successfully treated using sigmoidal multilayer perceptrons in previous works. The article deals with the case of complex boundary geometry, where the boundary is determined by a number of points that belong to it and are closely located, so as to offer a …

[Li20F]

Fourier Neural Operator for Parametric Partial Differential Equations, Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar.

Oct 2020

The classical development of neural networks has primarily focused on learning mappings between finite-dimensional Euclidean spaces. Recently, this has been generalized to neural operators that learn mappings between function spaces. For partial differential equations (PDEs), neural operators directly learn the mapping from any functional parametric dependence to the solution. Thus, they learn an …

[Lon18P]

PDE-Net: Learning PDEs from Data, Zichao Long, Yiping Lu, Xianzhong Ma, Bin Dong.

Jul 2018

Partial differential equations (PDEs) play a prominent role in many disciplines of science and engineering. PDEs are commonly derived based on empirical observations. However, with the rapid develo...

[Lu21L]

lululxvi/deepxde, Lu Lu.

Jan 2021

Deep learning library for solving differential equations and more

[Mar21P]

Parametric Complexity Bounds for Approximating PDEs with Neural Networks, Tanya Marwah, Zachary C. Lipton, Andrej Risteski.

Mar 2021

Recent empirical results show that deep networks can approximate solutions to high dimensional PDEs, seemingly escaping the curse of dimensionality. However many open questions remain regarding the theoretical basis for such approximations, including the number of parameters required. In this paper, we investigate the representational power of neural networks for approximating solutions to linear …

[Par90A]

Adapted solution of a backward stochastic differential equation, E. Pardoux, S. G. Peng.

Jan 1990

Let Wt; t ϵ [0, 1] be a standard k-dimensional Weiner process defined on a probability space (Ω, F, P), and let Ft denote its natural filtration. Given a F1 measurable d-dimensional random vector X, we look for an adapted pair of processes {x(t), y(t); t ϵ [0, 1]} with values in Rd and Rd×k respectively, which solves an equation of the form: x(t) + ∫t1f(s, x(s), y(s)) ds + ∫t1 [g(s, x(s)) + y(s)] …

[Qin19D]

Data driven governing equations approximation using deep neural networks, Tong Qin, Kailiang Wu, Dongbin Xiu.

Oct 2019

We present a numerical framework for approximating unknown governing equations using observation data and deep neural networks (DNN). In particular, we propose to use residual network (ResNet) as the basic building block for equation approximation. We demonstrate that the ResNet block can be considered as a one-step method that is exact in temporal integration. We then present two multi-step …

[Qua94N]

Numerical Approximation of Partial Differential Equations, Alfio Quarteroni, Alberto Valli.

1994

This book deals with the numerical approximation of partial differential equations. Its scope is to provide a thorough illustration of numerical methods, carry out their stability and convergence analysis, derive error bounds, and discuss the algorithmic aspects relative to their implementation. A sound balancing of theoretical analysis, description of algorithms and discussion of applications is …

[Rai18H]

Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations, Maziar Raissi, George Em Karniadakis.

Mar 2018

While there is currently a lot of enthusiasm about "big data", useful data is usually "small" and expensive to acquire. In this paper, we present a new paradigm of learning partial differential equations from {\em small} data. In particular, we introduce \emph{hidden physics models}, which are essentially data-efficient learning machines capable of leveraging the underlying laws of physics, …

[Rai18F]

Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations, Maziar Raissi.

Apr 2018

Classical numerical methods for solving partial differential equations suffer from the curse dimensionality mainly due to their reliance on meticulously generated spatio-temporal grids. Inspired by modern deep learning based techniques for solving forward and inverse problems associated with partial differential equations, we circumvent the tyranny of numerical discretization by devising an …

[Rai19P]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, M. Raissi, P. Perdikaris, G. E. Karniadakis.

Feb 2019

We introduce physics-informed neural networks – neural networks that are trained to solve supervised learning tasks while respecting any given laws of physics described by general nonlinear partial differential equations. In this work, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential …

[Sah18L]

Learning Equations for Extrapolation and Control, Subham S. Sahoo, Christoph H. Lampert, Georg Martius.

Jun 2018

We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to …

[Shi20C]

On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs, Yeonjong Shin, Jerome Darbon, George Em Karniadakis.

Jun 2020

Physics informed neural networks (PINNs) are deep learning based techniques for solving partial differential equations (PDEs) encounted in computational science and engineering. Guided by data and physical laws, PINNs find a neural network that approximates the solution to a system of PDEs. Such a neural network is obtained by minimizing a loss function in which any prior knowledge of PDEs and …

[Sir18D]

DGM: A deep learning algorithm for solving partial differential equations, Justin Sirignano, Konstantinos Spiliopoulos.

Dec 2018

High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural …

[Um21S]

Solver-in-the-Loop: Learning from Differentiable Physics to Interact with Iterative PDE-Solvers, Kiwon Um, Robert Brand, Yun, Fei, Philipp Holl, Nils Thuerey.

Jan 2021

Finding accurate solutions to partial differential equations (PDEs) is a crucial task in all scientific and engineering disciplines. It has recently been shown that machine learning methods can improve the solution accuracy by correcting for effects not captured by the discretized PDE. We target the problem of reducing numerical errors of iterative PDE solvers and compare different learning …

[Wan20U]

Understanding and mitigating gradient pathologies in physics-informed neural networks, Sifan Wang, Yujun Teng, Paris Perdikaris.

Jan 2020

The widespread use of neural networks across different scientific domains often involves constraining them to satisfy certain symmetries, conservation laws, or other domain knowledge. Such constraints are often imposed as soft penalties during model training and effectively act as domain-specific regularizers of the empirical risk loss. Physics-informed neural networks is an example of this …

[Wan20W]

When and why PINNs fail to train: A neural tangent kernel perspective, Sifan Wang, Xinling Yu, Paris Perdikaris.

Jul 2020

Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent. More importantly, even less is …

[Yan18P]

Physics-informed generative adversarial networks for stochastic differential equations, Liu Yang, Dongkun Zhang, George Em Karniadakis.