Denoising Diffusion and Score Based Generative Models | TransferLab

In some of our latest paper pills we have summarised the recent developments of denoising generative models (DGM), with particular emphasis on score-based techniques. In Score-Based Generative Modeling through Stochastic Differential Equations we have seen how DGM can be studied with the formalism of Stochastic differential equations, while Score-Based Generative Modeling with Critically-Damped Langevin Diffusion presents the recent state of the art in image generation.

In this pill we will be taking a step back and briefly go through some of the key publications that, in the past 7 years, have led to the success of DGMs.

In 2015, the seminal paper [Soh15D] showed that it is possible to generate samples (e.g. images or audio) by learning a variational decoder to reverse a discrete diffusion process that perturbs data with noise. The models trained with this type of technique were named de-noising diffusion probabilistic models (DDPM). Without awareness of this work, score-based generative models (SGM) were also being developed, motivated independently and through the use of a different mathematical formalism. In 2019, [Son19G] showed that the empirical performance of SGMs could rival that of other, widely acclaimed generative methods (GANs and VAEs).

At first glance, the connection between the SGM and DDPM seemed superficial, since the former is trained by score matching and sampled by Langevin dynamics, while the latter is trained by the evidence lower bound (ELBO) and sampled with a learned decoder. However, in 2020 the paper “Denoising Diffusion Probabilistic Models” [Ho20D] (of which both original code and a pytorch implementation are available) showed that the ELBO used for training diffusion probabilistic models is essentially equivalent to the weighted combination of score matching objectives used in score-based generative modeling.

Inspired by that work, the aforementioned [Son21S] further investigated the relationship between diffusion models and score-based generative models, and proved that not only the training process, but also the sampling method of DDPMs can be integrated with the annealed Langevin dynamics of score-based models. This creates a unified and more powerful sampler: the Predictor-Corrector sampler.

If you are interested in learning more about the history and development of de-noising generative models, I recommend the following blog posts: the first focuses on DDPM and goes through all the essential math. The second is centered around score-based methods and is a bit more high level, but it was written by one of the key authors (Yang Song) of the DGM revolution and presents some unique insights.

In summary, diffusion models are an exciting new direction for generative models that is based on rigorous mathematics and beautiful insights. It has quickly matured in recent years and, by now, looks ready to be deployed in great applications.

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our …

We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard to estimate when the data resides on low-dimensional manifolds, we perturb the data with different levels of Gaussian noise, and jointly estimate the corresponding scores, i.e., the vector fields …

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the …

References

In this series →