# 0x532 Flow

survey paper

Let $$Z \in R^D$$ be a tractable random variable with pdf $$p(Z)$$, let $$g$$ be a invertible function (with inverse $$f$$)

$Y = g(Z)$

using change of variable formula, we know

$p_Y(y) = p_Z(f(y)) |\det Df(y) |$

$$g$$ is the generator which moves the simple distribution to a complicated distribution, its inverse $$f$$ normalizes the complicated distribution towards simpler form.

To train a model, we optimize the log-likelihood only using $$f$$

$\log p(\mathcal{D} | \theta) = \sum_i \log p(y_i | \theta) = \sum \log p_Z(f(y_i | \theta)) + \log |\det Df(y_i | \theta) |$

To sample a new point, we just use sample $$z$$ and transform using $$g(z)$$

Normalizing flow vs VAE

Architecture:

• VAE's encoder/decoder is usually not invertible
• NF's encoder/decoder is bijective

Objective:

• VAE is to maxmize the lower bound of log-likelihood (ELBO)
• NF is to maximize the exact log-likelihood

$$f,g$$ control the expressiveness of the model, one way to build complicated bijective functions is to compose them

$g = g_N \circ g_{N-1} \circ ... \circ g_1$

which has the inverse $$f = f_1 \circ f_2 ... \circ f_N$$ and determinant

$\det Df(y) = \prod \det Df_i (x_i)$

### 3.1. Linear Flow

Model (linear flow) We first consider the simple linear flow with invertible $$A$$

$g(x) = Ax + b$

Linear flows is limited in its expressiveness: when $$p(z) = N(\mu, \Sigma)$$, then $$p(y) = N(A\mu + b, A^T\Sigma A)$$.

Additionally, computing determinant of Jacobian ($$\det A$$) is $$O(D^3)$$, computing inverse $$A^{-1}$$ costs same $$O(D^3)$$.

By constraining the matrix $$A$$ to be triangular, orthogonal etc improves the computational cost.

### 3.2. Planar Flow

Model (planar flow)

$g(x) = x + u h(W^Tx + b)$

where $$h$$ is a nonlinearity