Scalable Bayesian Deep Learning with Modern Laplace Approximations

Training Bayesian neural networks using the Laplacian method for Gaussian posterior approximation.

The Laplace approximation is a particular case of variational inference where one aims to approximate the posterior with a Gaussian. More precisely, a Laplace approximation uses a MAP estimate for the mean of the Gaussian posterior and derives the covariance matrix from a second-order expansion of the loss function around the MAP estimate. This method usually provides posterior approximations superior to a mean field Gaussian approximation, which is known to systematically underestimate the variance. However, the Laplace approximation is challenging to apply in the setting of Bayesian deep learning because estimating the full covariance matrix scales quadratically in the number of parameters.

[Dax22L] combines Laplace approximations with sub-network selection guided by Wasserstein distance. For a small sub-network they are able to improve the uncertainty estimates using the Laplace approximation and still maintain essentially the same accuracy as the original network.

It should be mentioned that there are still a few computational challenges open that currently have to be tackled with heuristics, most prominently the sub-network selection.

There is also a good talk by one of the authors.

References

In this series