Neural Fields in Visual Computing and Beyond

A great entry-level review to coordinate-based neural networks (a.k.a. neural fields), a technique that has recently shown widespread success in tasks ranging from 3d reconstruction to generative modelling and robotics.

In physics, fields are quantities that are defined for all spatial and temporal coordinates: given a coordinate $x$ and a time $t$, they are a mapping that returns a scalar or a vector. In the real world, any static or dynamic scene can be thought of as a field, with positions being the input and RGB values being the output. It is normally not possible to find an analytical description of a real-world scene, but there are several techniques to approximate or learn their functional shape. Recently, much progress has been made through the use of neural networks, a technique so prolific to deserve a standalone name: “neural fields”.

The review [Xie22N] gives a good introduction to this new and exciting technique.

Among the parts I found most interesting, section 2 of the paper describes how it is possible to include prior learning into 3d image reconstruction. For example, these types of problems emerge when constructing 3d street scenes from 2d images and lidar point-clouds for self-driving vehicles. Without prior knowledge of the smoothness or sparsity of street objects it is impossible for a neural network to generalize to unseen environments. Neural fields have several elegant ways to include latent variables that encode such prior knowledge.

Section 4 is concerned with applications where the reconstruction domain is different from the sensor domain. This can arise for example when building the 3d shape of a real object from just a few 2d images: there will always be some viewpoints which are not included in the input dataset. A priori, it is often unclear whether the 3d shape can be reconstructed at all, especially if one considers the complexity of predicting lighting and reflections on object surfaces. Nevertheless, great progress has been recently made through the use of Neural Radiance Fields (NeRFs): some of the most scenographic results can be seen in this blog post by NVIDIA (by now also several videos appeared on YouTube and social media, thanks to how easily NeRfs can be setup).

From section 7 onwards the paper outlines some real-world applications which are actively using neural fields. Any problem involving the reconstruction of a scene, be it to calculate distances or planning a path, localizing or grasping an object, or even just predicting the point of view of a camera in a scene can benefit enormously from the efficient information compression and reconstruction of neural fields.

Considering how important 3d perception is for navigating and understanding the real world, I believe neural fields will gain more and more popularity in the coming years and become one of the key building blocks of future AI products.

References