The sections are arranged based on the Chomsky hierarchy.

Contents Index

## Foundation

**Definition (language)** A language is a set of string

**Definition (grammar)** A grammar is a set of rules defining the syntax of a specific language

**Definition (model of computation)** *a model of computation is a model which describes how a set of outputs are computed given a set of inputs*

In my personal understanding, it is just the f part of a general $y=f(x)$

Personally, I think model of computation (or formal language) is IDL of abstract machines.

There are mainly three **models of computation**: regular languages, CFG and Turing Machine.

The difference of three models can be characterized by their memory usage

- regular language: limited memory
- CFG: unlimited stack-like memory (restricted)
- Turing Machine: unlimited random access memory (unrestricted)

## Regular Languages

Regular languages can be expressed by two different formulation: finite automaton, and regular expression. They are all equivalent.

### Finite Automaton

The automaton computes a function that maps string into the set ${0,1}$ (i.e. acceptance or not)

#### Deterministic Finite Automaton

**Definition (finite automaton)** A *finite automaton* is a 5-tuple $(Q, \Sigma, \delta, q_0, F) $

- $Q$ is a finite set called the
**states** - $\Sigma$ is a finite set called
alphabet - $\delta: Q \times \Sigma \rightarrow Q$ is the
**transition function** - $q_0 \in Q$ is the
**start state** - $F \subseteq Q $ is the
**set of accept states**

Interface:

- input: a string of
**alphabet** - output:
**Accept or Reject**(I prefer to understanding the output as a 1/0 bit)

If $A$ is the set of all strings that FA $M$ accepts, then $A$ is the language of machine $M$, which means $L(M)=A$

**Definition (regular language)** A language is called a *regular language* is some FA recognizes it

#### Nondeterministic Finite Automaton

#### Finite State Transducer

FST accept pairs of strings

**Definition (finite state transducer) **A finite state transducer is a tuple of

- $Q$ is a finite set of states
- $\Sigma$ is a finite set of input alphabet
- $\Gamma$ is a finite set of output alphabet
- $I \subset Q$ is the set of initial states
- $F \subset Q$ is the final states
- $\epsilon \subseteq Q \times (\Sigma \cup \{ \lambda \}) \times (\Gamma \cup \{ \lambda \} ) \times Q$ is the transition relation

**Definition (regular relations) **A generalization (tuple) of regular languages.

- the empty set and all $(x, y)$ where $x \in \Sigma, y \in \Gamma$
- $R_1 \circ R_2$ where $R_1, R_2$ are regular relation
- $R_1 \cup R_2$ where $R_1, R_2$ are regular relation
- $R^* = \cup_{i=0}^{\infty} R_i$

### Regular Expression

**Definition (regular expression) **$R$ is said to be a regular expression if $R$ is

- $a$ for some $a$ in the alphabet $\Sigma$
- $\epsilon$
- $\emptyset$
- $(R_1 \cap R_2)$
- $(R_1 \circ R_2)$
- $(R_1^*)$

Context Free Languages

Context Free languages is a more expressive language collection than regular languages. It can be used for natural language parsing and compiler.

Note that the name of context-free is because subtree node is not affected by its surrounding context (On the other hand, context-sensitive languages are affected by contexts $\alpha, \beta$ as in the following section)

The models of CFL is a computation model with un-limited memory but with limited manner of using memory.

### Context Free Grammar

**Definition (context-free grammar)** A context free grammar is a 4-tuple $(V, \Sigma, R, S)$ where

- $V$ is a finite set of variables
- $\Sigma$ is a finite set, disjoint from $V$, called the terminals
- $R$ is a finite set of rules, with each rule being a variable and a string of variables and terminals
- $S \in V$ is the start variable

#### Chomsky Normal Form (CNF)

Any context free grammar can be converted to an equivalent CNF

**Unary rule**$C \to x$**Binary rule**$ C \to C_1 C_2$

How to transform: binarization

### Pushdown Automaton

## Context-Sensitive Languages

### Grammar

$$\alpha A \beta \to \alpha \gamma \beta$$

## Recursive Languages

The recursive languages have three different formulations: Turing machine, recursive function and lambda calculus, which is proved by the Church-Turing thesis.

### Turing Machine

**Definition (recognizable)** A language is called *recognizable* (or Recursive Enumerable) if some Turing machine recognizes it.

- Any string in the language will be accepted by the TM
- Any other string will not be accepted (
**halting OR rejected**)

**Definition (decidable)** A language is called **decidable** (or **Recursive**) if some Turing machine decides it

- Any string in the language will be accepted by the TM
- Any other string will not be
**rejected**

**Definition (Universal Turing Machine)** Universal turing machine is a machine that can simulate any other Turing Machine.

### Recursive Function

### Lambda Calculus

## Computability

The cardinality of all possible languages is continuum, and the cardinality of language recognizable by Turing Machine is countable. Therefore, there are languages that are not recognizable by TM.

**Decision problem**: whether TM can decide a language by answering yes or no without looping. The following are some famous undecidable problems.

**Entscheidungsproblem**: the decision problem in first-order logic. Proven undecidable by Church-Turing Thesis.**Hilbert’s 10th problem**: the decision problem of Diophantine equations which have integer roots. Proven undecidable.**Halting problem**: the decision problem of deciding whether a TM will halt. Proven undecidable by Turing. (proof: by easy contradiction)

## Reference

[1] Sipser, Michael. “Introduction to the Theory of Computation.” *ACM Sigact News* 27.1 (1996): 27-29.