Learning nonlinear operators: the DeepONet architecture | TransferLab

It is widely known that neural networks with a single hidden layer are universal approximators of continuous functions. However, a less known but powerful result is that neural networks can also accurately approximate function operators, in other words, mappings from a space of functions to another space of functions.

In a seminal paper by Lu et al. [Lu21L], the authors propose a neural network architecture that is capable of learning arbitrary nonlinear continuous operators with small generalization error, namely the deep operator network (DeepONet). The proposal of approximating operators with neural networks goes beyond universal function approximation, and is significant as DeepONets can “learn to solve” a problem. It has the potential to dramatically speed up the solution of, for example, parameterised differential equations.

As a reference, consider the (differential) equation of the form $$ L s = u $$ where $L$ is an arbitrary function operator that maps function $s$ to function $u$. If, for given $u$, we want to find the solution $s$ of this equation, an interesting but non-trivial operator is the solution operator $$ L^{-1}: u \mapsto s $$ that maps $u$ to the solution $s$ of the equation. In fact, DeepONets are able to learn this implicit operator $L^{-1}$.

This is a paradigm shift in the context of physics-informed neural networks, because solving a differential equation does not correspond to training a network, but evaluating the solution operator of the equation instead. Given a representation of the input function $u$ (for instance, by evaluating at pre-defined positions), a forward pass through the operator network returns a representation of the solution $s$.

The DeepONet architecture is based on the universal approximation theorem for operators [Che95U], which is suggestive of the structure and potential of deep neural networks in learning continuous operators. Similar to the well-known universal function approximation theorem, the corresponding theorem for operators (Theorem 1 in [Lu21L]) states that two fully connected neural networks with a single hidden layer, combined by a vector dot product of the outputs, are able to approximate any continuous nonlinear operator with arbitrary accuracy if the layers are sufficiently large.

Figure 1 [Lu21L]: Illustrations of the problem set-up and new architectures of DeepONets that lead to good generalization. A, For the network to learn an operator $G: u \mapsto G(u)$ it takes two inputs $[u(x_1), u(x_2), \dots, u(x_m)]$ and $y.$ B, Illustration of the training data. For each input function $u,$ we require that we have the same number of evaluations at the same scattered sensors $x_1, x_2, \dots, x_m.$ However, we do not enforce any constraints on the number or locations for the evaluation of output functions. C, The stacked DeepONet is inspired by Theorem 1, and has one trunk network and $p$ stacked branch networks. The network constructed in Theorem 1 is a stacked DeepONet formed by choosing the trunk net as a one-layer network of width $p$ and each branch net as a one-hidden-layer network of width $n.$ D, The unstacked DeepONet is inspired by Theorem 2, and has one trunk network and one branch network. An unstacked DeepONet can be viewed as a stacked DeepONet with all the branch nets sharing the same set of parameters.

Inspired directly by this theoretical result, the DeepONet architecture (Figure 1) is made of two neural networks: A branch network encodes the discrete input function space and a trunk network encodes the domain of the output functions. In general, DeepONets can be constructed by choosing functions $\mathbf{g}: \mathbb{R}^m \to \mathbb{R}^p$ and $\mathbf{f}: \mathbb{R}^d \to \mathbb{R}^p$ from diverse classes of neural networks which satisfy the classical universal approximation theorem of functions. A generalized version of the approximation theorem for operators (Theorem 2 in [Lu21L]) states that for any nonlinear continuous operator $G: u \mapsto G(u)$ and any $\epsilon > 0$ the inequality

$$ \begin{equation} \Bigg\lvert G(u)(y) - \big\langle \mathbf{g}\left(u(x_1),u(x_2), \dots, u(x_m)\right), \mathbf{f}(y) \big\rangle \Bigg\rvert < \epsilon \end{equation} $$

holds for all viable input functions $u$ and $y \in \mathbb{R}^d$, where $x_1, x_2, \dots, x_m$ are sufficient evaluation points of $u$ and $\langle\cdot, \cdot\rangle$ denotes the dot product in $\mathbb{R}^p$. Note that the input function $u$ is represented by $m$ function evaluations, but it could also be projected to a finite set of basis function coefficients.

In the paper, several examples demonstrate that DeepONets can learn various explicit operators, such as integrals and fractional Laplacians, as well as implicit operators that represent deterministic and stochastic differential equations. This shows the reliable application of DeepONets to learning a wide range of function operators.

Moreover, DeepONets can also be trained in the spirit of physics-informed neural networks, where a partial differential equation (PDE) is incorporated within the loss function. It is reported that predicting the solution of various types of parametric PDEs is up to three orders of magnitude faster compared to conventional PDE solvers [Wan21L].

Although the DeepONet architecture is sufficient for learning any operator, unfortunately, the universal approximation theorem does not inform us on how to learn these operators efficiently. In practice, carefully constructed network architectures will probably be more efficient for learning specific problems, and consequently, subsequent works propose improvements to the DeepONet architecture. Other architectures include, for instance, the general class of Neural Operators [Kov23N].

If you are interested in trying out DeepONets and other neural operators in a generalized framework, check out continuiti, our Python package for learning function operators with neural networks.

It is widely known that neural networks (NNs) are universal approximators of continuous functions. However, a less known but powerful result is that a NN with a single hidden layer can accurately approximate any nonlinear continuous operator. This universal approximation theorem of operators is suggestive of the structure and potential of deep neural networks (DNNs) in learning continuous …

The purpose of this paper is to investigate neural network capability systematically. The main results are: 1) every Tauber-Wiener function is qualified as an activation function in the hidden layer of a three-layered neural network; 2) for a continuous function to be a Tauber-Wiener function, the necessary and sufficient condition is that it is not a polynomial; 3) the capability of approximating …

Partial differential equations (PDEs) play a central role in the mathematical analysis and modeling of complex dynamic processes across all corners of science and engineering. Their solution often requires laborious analytical or computational tools, associated with a cost that is markedly amplified when different scenarios need to be investigated, for example, corresponding to different initial …

References

In this series →