0x420 Foundation

Machine Learning is concerned with finding regularities within data, and take actions based on those regularities.


Probability Theory

A widely used frequentist estimator is maximum likelihood, in which $w$ is set to the value that maximizes the likelihood function $p(\mathcal{D}|w)$

Decision Theory

The joint probability distribution $p(x, y)$ provides a complete summary of the uncertainty associated with these variables. Determination of $p(x,y)$ from a set of training data is an example of inference and typically a difficult problem.

Generative model

Generative Models are approaches that model the distribution of inputs as well as outputs explicitly or implicitly.

It models one of the following

  • joint the distribution$p(\mathbf{x}, \mathbf{y})$
  • both density function $p(\mathbf{x}|\mathbf{y})$ and prior $p(\mathbf{y})$

It factorizes the joint distribution $p(\mathbf{x}, \mathbf{y})=p(\mathbf{y})p(\mathbf{x}|\mathbf{y})$

The marginalized distribution $p(\mathbf{x})$ can be computed by $\sum_{\mathbf{k}} p(\mathbf{x}, \mathbf{y})$. This should be easy as the target $\mathbf{y}$ is low dimension (e.g: the number of class in classification task)

The conditional distribution $p(\mathbf{y}|\mathbf{x})$ can be computed by $\frac{p(\mathbf{x}, \mathbf{y})}{p(\mathbf{x})}$

Discriminative model

It models the conditional probability $p(\mathbf{y}|\mathbf{x})$ directly. The difference between generative and discriminative is that it does not include a model of $p(x)$ that it often contains many highly dependent features that are difficult to model [4]. The model $p(\mathbf{y}|\mathbf{x})$ will ignore any structure (e.g: dependence between $\mathbf{x}$) as it is constant with respect to $\mathbf{y}$

The reject option

We can avoid making decisions on difficult cases where $p(\mathcal{C}_k | x)$ is significantly less than 1. (e.g: avoid making decisions on difficult X-rays images, and leave those cases to experts)


[1] Hastie, Trevor, et al. “The elements of statistical learning: data mining, inference and prediction.” The Mathematical Intelligencer 27.2 (2005): 83-85.

[2] Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.