Skip to content

0x041 Stochastics

1. Markov Process

1.1. Markov Chain

Definition (Markov chain) The process \(X\) is a Markov chain if it has the Markov property:

\[P(X_n = s | X_0 = x_0, X_1 = x_1, .., X_{n-1} = x_{n-1}) = P(X_n = s | X_{n-1} = x_{n-1})\]

Definition (homogeneous) A Markov chain \(X\) is called homogeneous if

\[P(X_{n+1} = j | X_n = i) = P(X_1 = j | X_0 = i)\]

The transition matrix \(P=(p_{ij})\) is the \(|S| \times |S|\) matrix of transition probabilities \(p_{ij} = P(X_{n+1} = j | X_n = i)\)

Theorem (Chapman-Kolmogorov) The n-step transition matrix \(P(m, m+n) = (p_{ij}(m, m+n)) = P^n\) where \(p_{ij}(m, m+n) = P(X_{m+n} = j | X_m = i)\)

Evolution of the chain is determined by the transition matrix \(P\) and initial mass function \(\mu\), many questions about the chain is thus largely reducible to the study of matrices.

1.1.1. Notations State Notations

Let \(P_i, E_i\) denote the probability conditional on \(X(0) = i\):

\[P_i(A) = P(A | X_0 = i), E_i(X) = E(X | X_0 = i)\]

Definition (recurrent state, transient state) State \(i\) is called recurrent if

\[P_i(X_n = i \text{ for some } n \geq 1) = 1\]

The probability of eventual return to \(i\), starting from \(i\) is 1. Otherwise it is called transient

Definition (positive) For a recurrent state \(i\), \(i\) is called positive if the mean recurrence time \(\mu_i < \infty\), otherwise it is called null

\[\mu_i = E_i(T_i) = \sum_n n f_{ii}(n)\]

where \(f_{i,j}(n) = P_i(X_1 \neq j, X_2 \neq j, ..., X_{n-1} \neq j, X_n = j)\) is the first passage times

Definition (periodic) The period \(d(i)\) of a state \(i\) is defined by \(d(i) = gcd \{ n: p_{ii}(n) > 0 \}\), \(i\) is periodic if \(d(i) > 1\) otherwise aperiodic

Definition (ergodic) A state is called ergodic if it is recurrent, positive and aperiodic


Ergodicity is a more general concept whose statement is the time average of trajectory almost equals the ensemble average

\[\langle J \rangle = \lim_{T \to \infty} \frac{1}{T} \int J(u(t))\] Chain Notations

Defintion (communication) \(i\) is said to communicate with \(j\) with notation \(i \to j\) if the chain visits state \(j\) with strictly positive prob starting from \(i\). and they intercommunicate if \(i \to j\) and \(j \to i\)

Defintion (closed, irreducible) A set \(C\) of states is called

  • closed: if \(p_{i,j} = 0\) for all \(i \in C, j \notin C\)
  • irreducible: if \(i,j\) intercommunicate for all \(i,j \in C\)

Theorem (decomposition) The state space \(S\) can be partitioned uniquely as

\[S = T \cup C_1 \cup C_2 ...\]

where \(IT\) is the set of transient states, and \(C_i\) are irreducible closed sets of recurrent states

1.1.2. Random Walk

The simple random walk on the integers have state space \(S = \{ 0, 1, -1, 2, -2, ... \}\) and transition probabilities:

\[p_{ij} = \begin{cases} p & \text{ if } j = i+1 \\ q = 1-p & \text{ if } j = i-1 \\ 0 & \text{otherwise} \end{cases}\]

Gambler's ruin

A gambler starts with \(k\) units and ends with either \(0\) or \(N\). Every turn it increases money by 1 units with probability \(p\), otherwise decrease 1 unit with probability \(q=1-p\). Let \(p_k\) be the probability of ultimate ruin starting from \(k\), then using the law of total probability (i.e \(P(A) = \sum_i P(A|B_i)P(B_i)\)), we know

\[p_k = p \cdot p_{k+1} + q \cdot p_{k-1}\]

Solving this gives the solution

\[p_k = \frac{(q/p)^k - (q/p)^N}{1 - (q/p)^N}\]

when \(p=1/2\), this simplifies to \(p_k = 1 - k/N\)

Mean time \(D_k\) to reach the absorbing barriers has similar formula

\[D_k = p (1 + D_{k+1}) + q(1+D_{k-1})\]

Solving this gives the solution

\[D_k = \begin{cases} \frac{1}{q-p}[k- N (\frac{1 - (q/p)^k}{1 - (q/p)^N})] & \text{ if } p \neq 1/2 \\ k(N-k) & \text{ if } p = 1/2 \end{cases}\]

One useful technique for simple random walk is to count the number of sample paths

Let \(S_n = a + \sum_{i=1}^n X_i\) where \(S_0 = a\) and move with \(n\) steps. It is easy to see

\[P(S_n = b) = \binom{n}{\frac{1}{2} (n+b-a)} p^{\frac{1}{2}(n+b-a)}q^{\frac{1}{2}(n-b+a)}\]

where the first term \(N_n(a,b)\) is the number of path from \((0, a)\) to \((n,b)\)

\[N_n(a,b) = \binom{n}{\frac{1}{2} (n+b-a)}\]

Theorem (reflection) among all paths from \((0,a)\) to \((n,b)\) the number of paths which contain some point \((k,0)\) can be written as

\[N_n^0(a,b) = N_n(-a,b)\]

This can be seen by reflecting the path along the x-axis

1.2. Continuous-time Markov Process

Now generalize the discrete time to the continuous time: \(X = \{ X(t) | t \geq 0 \}\), states are still countable

Definition (Markov property) The process \(X\) satisfies Markov property if

\[P(X(t_n) = j | X(t_1) = i_1, ..., X(t_{n-1} = i_{n-1}) = P(X(t_n) = j | X_(t_{n-1}) = i_{n-1})\]

when \(t_i\) is an increasing sequence

Definition (transition probability)

\[p_{ij}(s,t) = P(X(t) = j | X(s) = i)\]

It is homogeneous if \(p_{ij}(s, t) = p_{ij}(0, t-s)\) for all \(i,j,s,t\) and we only write \(p_{ij}(t-s)\) to denote \(p_{ij}\)

2. Stationary Process

Definition (strongly stationary) The process \(X = \{ X(t) | t \geq 0 \}\), taking values in \(R\), is called strongly stationary if the families

\[\{ X(t_1), X(t_2), ..., X(t_n) \}, \{ X(t_1+h), X(t_2+h), ..., X(t_n+h) \}\]

have the same joint distribution, for all given \(t_1, ..., t_n\) and \(h > 0\)

Instead of the finite-dimensional joint distribution (or fdds), covariance also characterize the joint distribution, which gives the weakly stationary

Definition (weakly stationary) The process \(X = \{ X(t) | t \geq 0 \}\) is called weakly stationary if for all \(t_1, t_2\), \(h > 0\)

\[EX(t_1) = EX(t_2)\]
\[cov(X(t_1), X(t_2)) = cov(X(t_1+h), X(t_2+h))\]

3. Renewals

Definition (renewal process) A renewal process \(N= \{ N(t): t \geq 0 \}\) is a process for which

\[N(t) = \max \{ n | T_n \leq t \}\]

where \(T_0=0\) and

\[T_n = \sum_{i=1}^N X_i\]

and \(X_i\) are iid non-negative random variables

3.1. Poisson Process

If the \(X_i\) are exponentially distributed, then \(N\) is a Poisson process.

Definition (Poisson process) A Poisson process with intensity \(\lambda\) is a process \(N = \{ N(t) : t \geq 0 \}\) taking values in \(S = \{ 0, 1, 2, ... \}\) such taht

\[P(N(t+h) = n+m | N(t) = n) = \begin{cases} \lambda h + o(h) & \text{if } m =1 \\ o(h) & \text{if } m > 1 \\ 1 - \lambda h + o(h) & \text{ if }m=0 \\ \end{cases}\]

Additionally \(N(0) = 0, N(s) \leq N(t)\) for \(s < t\) and \(N(t) - N(s)\) is independent of the times of emissions during \([0, s]\)

Bernoulli process and Poisson process

It can also be interpreted as a generalization of Bernoulli process to the continuous space

Definition (Bernoulli process) Let \(S = \{ 0, 1, 2 ... \}\) and define the Markov chain \(Y\) by

\[P(Y_{n+1} = s+1 | Y_n = s) = p, P(Y_{n+1} = s | Y_n = s ) = 1-p\]

Theorem (distribution of Poisson process) The random variable \(N(t)\) has the Poisson distribution with parameter \(\lambda t\)

\[P(N(t) = j) = \frac{e^{-\lambda t}(\lambda t)^j}{j!}\]

An alternative formulation is to use the arrival times: \(T_0, T_1, ...\) where \(T_n\) is the time of the n-th arrival

\[T_0 = 0, T_n = \inf_t (N(t) = n)\]

The interarrival times are random variables \(U_0, U_1, ...,\)

\[U_n = T_{n+1} - T_n\]

The random variables \(U_0, U_1, ...\) are independent, each having the exponential distribution with parameter \(\lambda\)

3.2. Limit Theorems

4. Queues

5. Martingales

Definition (martingale) A sequence \(Y = \{ Y_n : n \geq 0 \}\) is a martingale with respect to the sequence \(X = \{ X_n : n \geq 0 \}\) if, for all \(n \geq 0\)

\[E|Y_n| < \infty\]
\[E(Y_{n+1} | X_0, X_1, ..., X_n) = Y_n\]

Note a Markov chain might not be Martingales, and vice versa

simple martingale

Let \(X_1, X_2, ...\) be independent variables with zero means. The sequence of partial sum \(S_n = X_1 + X_2 + ... + X_n\) is a martingale wrt \(X\)

\[E(S_{n+1} | X_1, ..., X_n) = E(S_n + X_{n+1} | X_1, ..., X_n) = S_n + 0\]

6. Diffusion Process

Diffusion process is indexed by continuous time and taking values in real line.

6.1. Brownian Motion, Wiener Process

Brownian Motion can be seen as the limit of the random walks

Definition (Brownian Motion, Wiener process) A Wiener process \(W = \{ W(t) : t \geq 0 \}\) has to satisfy

  • \(W\) has independent increments
  • \(W(s+t) - W(s)\) is distributed as \(N(0, \sigma^2 t)\) for all \(s,t \geq 0\)
  • sample paths of \(W\) is almost surely continuous

It can be shwon to have the following properties:

  • the sample paths are continuous functions of \(t\)
  • almost all sample paths are nowhere diffeerntiable functions of \(t\)

7. Reference

  • CMU 36410 Introduction to Probability Modeling notes
  • Probability and Random Processes