Skip to content

0x363 Storage

1. Storage Model

A DBMS's storage model specifies how it phyiscally organizes tuples on disk and in memory.

It has different performance characteristics based on the target workload (OLTP vs OLAP)

1.1. Row Model (N-ary Storage Model)

Row-oriented model:

  • stores (almost) all attributes for a single tuple contiguously in a single page (e.g. 4KB page size)
  • ideal for OLTP worklads
  • fast read

Reference: CMU 15445 Lecture 4

1.1.1. Layout

Each page contains a header (page size, checksum, version...).

There are two types of layouts:

  • page oriented layout
  • log-structured model

Tuple-Oriented model, usually associated with B-Tree


  • fast read

Slotted Tuples

Reference: CMU 15445 Lecture 3

Log oriented layout appends the new log to pages (hbase, cassandra, leveldb ...)


  • fast write

The most popular modell is the LSM-Tree + SSTable

SSTable (Sorted String Table): requires the key-value pair be sorted by key in each segment.

LSM-Tree LSM-Tree's storage stores multiple SSTables

  • When the memtable gets bigger than some threshold, we write it out to a disk as an SSTable
  • from time to time, we merge and compact segments to discard overwritten values

1.2. Column Model (Decomposition Storage Model)

Column Model:

  • stores the values of a single attribute for all tuples contiguously in a page
  • Ideal for OLAP workloads where read-only queries perform large scans over a subset of the table‚Äôs attributes

Reference: CMU 15445 Lecture 4

Large value:

  • make them into overflow page and store pointer
  • make them into external file and store path

Capacitor in Dremel

Capacitor is the column-based storage format used in Dremel


1.2.1. Compression

Column Compression

Sequences of values for each column is usually repetitive, so we can compress them using several compression techniques, compression can reduce memory bandwidth and help vectorize the processing

Compression techniques are for example:

  • bitmap encoding: for each unique label, we store a bitmap
  • run-length encoding: if the number of distinct label is very large, then the bitmap is sparse, which we can further use run-length encoding to compress

Compression Order

We can impose order to all rows using a column to improve processing efficiency and compression. Different column sorting can be done in different replications

1.3. Mixed Model

a mix of row-oriented model and column-oriented model:

Wide-column model:

  • column-family are stored separately (as the col model does)
  • but within column family, multiple columns are stored together (as the row model does)


BigTable is a sparse multi-dimension map

\[key(row, column, timestamp) \rightarrow value \]

Apache Parquet

See this twitter blog

good intro video

2. Storage Management

One solution to manage storage is using file mapping (mmap), it works well for reading, but not good when there are multiple writers. Instead, database tends to manage storage by themselves using database pages (different page size for different db)

Heap File A heap file is an unordered collection of pages. There are two styles of heap file.

  • Linked List: there is a header managing pointers to data pages and free pages
  • Page Directory: track the location and free slots for each page