# 0x461 Speech Recognition

Speech recognition is a task modeling relation between speech input $X$ and linguistic (text output) $Y$

Contents Index

## GMM-HMM

GMM-HMM is one of the traditional generative models.

An alternative generative model with HMM is the DBN-HMM

## DNN-HMM

DNN is a discriminative model, so it can not be directly connected with the HMM model, In the DNN-HMM hybrid model, DNN first estimates the posterior $P(p|X)$ where $p$ is usually a CD state, then converting this output to likelihood with prior by using

$$P(X|p) \propto \frac{P(p|X)}{P(p)}$$

Then the $P(X|p)$ can be plugged into the generative HMM framework.

## CTC

CTC is a discriminative model

## RNN Transducer

RNN-T extends the CTC model by removing the conditional independence assumption