0x542 Flow

survey paper

Let \(Z \in R^D\) be a tractable random variable with pdf \(p(Z)\), let \(g\) be a invertible function (with inverse \(f\))

\[Y = g(Z)\]

using change of variable formula, we know

\[p_Y(y) = p_Z(f(y)) |\det Df(y) |\]

\(g\) is the generator which moves the simple distribution to a complicated distribution, its inverse \(f\) normalizes the complicated distribution towards simpler form.

To train a model, we optimize the log-likelihood only using \(f\)

\[\log p(\mathcal{D} | \theta) = \sum_i \log p(y_i | \theta) = \sum \log p_Z(f(y_i | \theta)) + \log |\det Df(y_i | \theta) |\]

To sample a new point, we just use sample \(z\) and transform using \(g(z)\)

Normalizing flow vs VAE

Architecture:

VAE's encoder/decoder is usually not invertible
NF's encoder/decoder is bijective

Objective:

VAE is to maxmize the lower bound of log-likelihood (ELBO)
NF is to maximize the exact log-likelihood

\(f,g\) control the expressiveness of the model, one way to build complicated bijective functions is to compose them

\[g = g_N \circ g_{N-1} \circ ... \circ g_1\]

which has the inverse \(f = f_1 \circ f_2 ... \circ f_N\) and determinant

\[\det Df(y) = \prod \det Df_i (x_i)\]

3.1. Linear Flow

Model (linear flow) We first consider the simple linear flow with invertible \(A\)

\[g(x) = Ax + b\]

Linear flows is limited in its expressiveness: when \(p(z) = N(\mu, \Sigma)\), then \(p(y) = N(A\mu + b, A^T\Sigma A)\).

Additionally, computing determinant of Jacobian (\(\det A\)) is \(O(D^3)\), computing inverse \(A^{-1}\) costs same \(O(D^3)\).

By constraining the matrix \(A\) to be triangular, orthogonal etc improves the computational cost.

3.2. Planar Flow

Model (planar flow)

\[g(x) = x + u h(W^Tx + b)\]

where \(h\) is a nonlinearity

0x542 Flow

3.1. Linear Flow

3.2. Planar Flow

3.3. RealNVP

3.4. Inverse Autoregressive Flow