Skip to content

0x360 Interface

1. History

  • 1960: Charles Bachman designed the Integrated Database System, the first DBMS. (Turing 1973)
  • 1964: IBM designed IMS (hierarchical model) for Apollo program (BOM for Saturn V)
  • 1969: CODASYL (network model) published
  • 1970: Codd from IBM research lab published the paper of the relational model. (Turing 1981)
  • 1973 Ingres project (led by Stonebraker) demostrate that it was possible to build practical and efficient implementation relation model. (Turing 2014)
  • 1974: SQL designed by Donald Chamberlin and Raymond Boyce from IBM (SQL become ANSI standard in 1986)
  • 1979: Oracle database
  • 1995: MySQL (MySQL AB -> Sun Microsystems -> 2010 Oracle), it was forked into MariaDB by its cofounder.
  • 1996: PostgreSQL (originally called POSTGRES from the Ingres database)

2. Data Model

A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities

2.1. Relational Model

Relational Model is a data model representing data with unordered set of tuples.

A relational database contains a set of named relations (or tables) Each relation has a set of named attributes (or columns) Each tuple (or row) has a value for each attribute, tuples in the relation is not ordered. Each attribute has a type (or domain)

The schema is the formal description of relations in database, it is something like typing in programming languages.

Instance is the actual contents at given point in time There is a special value NULL for unknown and undefined, which is important in relational database.

Key is an attribute whose value is unique in each tuple or set of attributes whose combined values ar unique. Keys are often denoted by underlining the set of key attributes. Relational database tends to build special index for keys to achieve efficiency.

There are two types of languages in relational model

  • DDL (data definition language): create, drop
  • DML (data manipulation language): select, insert

query relation model

The steps in creating and using a relational database

  • Design schema; create using DDL
  • Bulk load initial data
  • Repeat; execute queries and modificiations

Queries return relations (or table)

2.2. Hierarchical Model

Hierarchical Model or Document Model represent all data as a tree structure. It worked well for 1-to-Many relationships, but it made many-to-many relationships difficult.

To handle n-n relationships, we can

  • denormalize the data (duplicating)
  • resolve the join inside the application.

2.2.1. XML

XML (Extensible Markup Language) is a standard for data representation and exchange. XML has more flexibility when compared with the relational model.

The basic constructs of XML

  • tagged elements, which can be nested
  • attributes
  • text (can be thought as the content for the leaf node)
2.2.1.1. DTD/XSD

A well-formed XML adheres to basic structural requirements

  • single root
  • matched tags, no open tags
  • unique attributes within elements

A valid XML adheres to the basic structral requirements and also content-specific specification Examples to describe the specification are

  • Document Type Descriptor (DTD)
  • XML Schema (XSD)

The specification can be provided to the XML parser to decide whether the XML is valid or not. For example, xmllint can be used to verify.

Document Type Descriptor (DTD) DTD is a grammar-like language for specifying elements, attributes, nesting, ordering, occurrences It also has special attribute types such as ID and IDREF(S), which works like the pointers

XML Schema (XSD) In addition to the DTD feature, XSD also sepcifies data types, keys, pointers. It is also written in XML.

2.2.2. JSON

JSON is a stanrdard to serialize data objects into human readable format, it is useful for data interchange, and representing & storing semistructured data.

2.3. Network Model

The network model represents data using a graph structure. Link between records in the network model are like the pointers in the programming languages. It was standarized by CODASYL.

Querying network need to follow a path from the root designed manually.

3. Query Language

3.1. Relational Algebra

Relational algebra defines the interface how to retrieve and manipulate tuples in a relation. Input is relation, output is relation, it is proposed by Codds when working in IBM.

Query (or expression) on set of relations produce relation as a result.

Note that semantics of relational algebra removes duplicates from relation when querying (because rows are set).

operator (selector) selector is to pick certain rows based on condition.

\[\sigma_{cond}(expr)\]

operator (project) project is to pick certain columns.

\[\Pi_{attrs}(expr)\]

operator (cross product) combine two relations.

\[A \times B\]

Suppose \(A\) has \(s\) tuples and \(B\) has \(t\) tuples, then cross product has \(st\) tuples. relation name is attached to each attribute to disambiguate.

\[|A \times B| = |A||B|\]

operator (natural join) Enforce equality on all attributes with same name, and eliminate one copy of duplicate attributes.

\[A \bowtie B\]

Natural join does not add any expression power to algebra, it can be expressed using the join

\[ A \bowtie B = \Pi_{schema(E_1) \cup schema(E_2)} (\sigma_{E_1.A_1 = E_2.A_1, E_1.A_2 = E_2.A_2} (A \times B))\]

operator (theta join) Apply the condition to join, this is the basic operation used in DBMS

\[A \bowtie_{\theta} B = \sigma_{\theta} (A \times B)\]

The following are set operators operator (union) combines rows vertically, the schema should match

\[A \cup B\]

operator (difference)

\[ A - B\]

operator (intersection) intersection can be expressed with union and difference

\[ A \cap B\]

operator (rename) rename the schema for an relation

\[\rho_{R(A_1, A_2, ..., A_n)(E)}\]

Rename operator is to unify schemas for set operators because schema must match.

3.2. SQL

SQL is the declarative language partially implementing the relational algebra, derived from SEQUAL from IBM. Row in SQL is an unordered multiset.

statement (select)

\[\text{Select }A_1, A_2, ..., A_n\text{ From }R_1, R_2, ..., R_m\text{ where condition}\]

This is equivalent to the following sigma algebra

\[\Pi_{A_1, A_2, ..., A_n}(\sigma_{condition}(R_1 \times R_2 \times ... \times R_m))\]

duplicates: select will return duplicate because SQL is a multiset model, to reduce the duplicates, add distinct after select

order: select will return results without any order, to force order, add order by attr at the end.