Skip to content

0x533 Diffusion Model


4.1. Score Matching Models

The general score matching description is here

Model (denoising score matching)

Model (sliced score matching)

Model (NCSN, Noise Conditional Score Networks) Contributions are

  • perturbing the data using various levels of noise \(\sigma_1, ..., \sigma_L\)
  • simultaneously estimating scores corresponding to all noise levels by training a single conditional score network \(s_\theta\)
\[s_\theta(x, \sigma) \approx \nabla_x \log q_\sigma(x)\]

The sampling is done by the annealed Langevin dynamic, which continue to applye Langevin dynamic for each noise scale \(\sigma_i\)

4.2. Denoising Diffusion

Model (DDPM, Denoising Diffusion Models) Diffusion models are latent variable models of the forms

\[p_\theta(x_0) = \int p_\theta(x_{0:T}) dx_{1:T}\]

where \(x_{1:T}\) are latent variables

reverse process The joint complete distribution \(p_\theta(x_{0:T})\) is called the reverse process, it is defined with

\[p_\theta(x_{0:T}) = p(x_T)\prod_{t=1}^T p _\theta(x_{t-1} | x_t)\]

where \(p_{\theta}(T) = N(0, I)\) and

\[p_\theta(x_{t-1} | x_t) = N(x_{t-1} | \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))\]

forward process, diffusion process The approximate posterior is a fixed markov chain which adds noise to the data according to a variance schedule \(\beta_1, ..., \beta_T\)

\[q(x_{1:T} | x_0) = \prod_{t=1}^T q(x_t | x_{t-1})\]


\[q(x_t | x_{t-1}) = N(\sqrt{1-\beta_t} x_{t-1}, \beta_t I)\]

The simplified objective is

\[L_{simple}(\theta) = E_{t, x_0, \sigma}( \| \sigma - \sigma_\theta(\sqrt{\bar{\alpha}_t}x_0 + \sqrt{1- \bar{\alpha_t}}\sigma, t \|))\]

This objective is analogous to the loss weighting used by the NCSN denoising score matching model

Model (improved diffusion) Improvment diff are

Noise scheduling is cosine instead of linear, it adds noise more slowly

Learning variance \(\Sigma_\theta(x_t, t)\) instead of using a fixed one \(\sigma^2I\) where \(v\) is learned output

\[\Sigma_\theta(x_t, t) = \exp(v \log \beta_t + (1-v) \tilde{\beta}_t)\]

4.3. Sampling

Model (DDIM, denoising diffusion implicit model) faster sampling with a non-Markovian diffusion process

Model (PNDM, pseudo numerical methods for diffusion models)

4.4. Conditional Diffusion

Model (Guided diffusion, classifier-guided)

Model (classifier-free guidance)

Model (GLIDE, text-to-image)

Model (SDEdit)

Model (bit diffusion, discrete diffusion)

Model (DreamFusion, 3d diffusion, text to 3d)

4.5. Latent Diffusion

Model (latent diffusion, stable diffusion)

run diffusion on the latent space, the diffusied latent vector is further decoded into an image