Score-Based Generative Modeling through Stochastic Differential Equations

A seminal contribution to the field of diffusion models, here a connection between de-noising, score-matching and stochastic differential equations is established. This work unifies previous approaches to diffusion models in an elegant way and reaches new state of the art.

Diffusion models have recently emerged as the state of the art for generative modelling. Among them, two of the most popular implementations are Score matching with Langevin dynamics [Son19G] (SMLD) and de-noising diffusion probabilistic models [Ho20D] (DDPM). Both are based on the idea of generating data by first corrupting training samples with slowly increasing noise, and then training a model to revert the process. However, while SMLD focus on learning the score of the images, i.e. the gradient of the log probability density with respect to data, and then use Langevin dynamics to sample from a decreasing noise sequence, DDPM trains a sequence of probabilistic models to reverse each step and uses knowledge of the functional form of the reverse distributions to make training tractable.

Despite the slightly different formulations, both methods rely on the invertibility of diffusion processes and can be unified under the formalism of stochastic differential equations (SDEs). [Son21S] proposes such a unified framework. Leveraging the formalism of SDEs and adapting it to the modern deep generative models, the authors give the following theoretical and practical contributions:

flexible sampling: general purpose solvers of SDEs have been an active research field in mathematics over the past decades. Among the most promising for generative modelling are score-based MCMC methods and probability flow ODEs.
controllable generation: more flexible management of scores in the reverse-time SDE. This allows better class-conditional generation (e.g. asking to generate dogs or cats pictures instead of picking randomly from the two classes). For examples on how this can be implemented with DDPMs, see the recent paper [Nic22G] by OpenAI.
unified framework: the work unifies previous approaches under the same formalism and extends them to continuous time. This yields numerous mathematical benefits, e.g. the possibility to use classical sampling algorithms which have stability guarantees under broad assumptions, such as Euler-Maruyama and stochastic Runge-Kutta methods.

Drawing from the techniques of statistical mechanics, this work reaches a new state of the art in image generation while at the same time introducing the formalism of SDEs to the world of generative models. The mixing of these two disciplines represents an exciting and very promising new frontier for machine learning.

References

[Son19G]

Generative Modeling by Estimating Gradients of the Data Distribution, Yang Song, Stefano Ermon.

2019

We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard to estimate when the data resides on low-dimensional manifolds, we perturb the data with different levels of Gaussian noise, and jointly estimate the corresponding scores, i.e., the vector fields …

[Ho20D]

Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain, Pieter Abbeel.

Dec 2020

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our …

[Son21S]

Score-Based Generative Modeling through Stochastic Differential Equations, Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole.

Jan 2021

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the …

[Nic22G]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen.

Mar 2022

Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators …

References

In this series →