0x542 Flow
Let \(Z \in R^D\) be a tractable random variable with pdf \(p(Z)\), let \(g\) be a invertible function (with inverse \(f\))
using change of variable formula, we know
\(g\) is the generator which moves the simple distribution to a complicated distribution, its inverse \(f\) normalizes the complicated distribution towards simpler form.
To train a model, we optimize the log-likelihood only using \(f\)
To sample a new point, we just use sample \(z\) and transform using \(g(z)\)
Normalizing flow vs VAE
Architecture:
- VAE's encoder/decoder is usually not invertible
- NF's encoder/decoder is bijective
Objective:
- VAE is to maxmize the lower bound of log-likelihood (ELBO)
- NF is to maximize the exact log-likelihood
\(f,g\) control the expressiveness of the model, one way to build complicated bijective functions is to compose them
which has the inverse \(f = f_1 \circ f_2 ... \circ f_N\) and determinant
3.1. Linear Flow
Model (linear flow) We first consider the simple linear flow with invertible \(A\)
Linear flows is limited in its expressiveness: when \(p(z) = N(\mu, \Sigma)\), then \(p(y) = N(A\mu + b, A^T\Sigma A)\).
Additionally, computing determinant of Jacobian (\(\det A\)) is \(O(D^3)\), computing inverse \(A^{-1}\) costs same \(O(D^3)\).
By constraining the matrix \(A\) to be triangular, orthogonal etc improves the computational cost.
3.2. Planar Flow
Model (planar flow)
where \(h\) is a nonlinearity