Skip to content

0x322 Memory

1. Virtual Memory

There are three types of memory kernel is handling.

  • logical address: used in the instruction set, consists of segment + offset. This address is visible in user space.
  • linear address (virtual address): the address translated from logical address by segmentation unit
  • phyiscal address: actual address in the memory cell, translated from linear address by paging unit. Kernel is responsible to setup segmentation and paging to perform virtual memory translation. (Actual translation is done by MMU and TLB). Although segmentation looks not enabled in current Desktop OS, because segment registers (cs, ds, ss(stack segment)...) are always set to 0.

Kernel in real mode and protected mode handles two translation differently. By the way, protected mode is controlled by PE bit on CR0 register in x86.

2. Segmentation translation

real-mode: translation rule is segment address * 16 + offset (20bit). The entire memory space is 1MB (20bit), each segment memory space is 64KB.

protected mode: segment register stores segment selector instead of address. Segment selector is used to select segment descriptor on GDT (global descriptor table), linear address and size of the segment are retrieved from segment descriptor.

2.1. Paging translation

On 32-bit arch, the address space is 0x00000000 - 0xffffffff

  • user space range 0x00000000 - 0xbfffffff
  • kernel space is 0xc0000000 - 0xffffffff

3. Pages

3.1. Zones

3.2. Allocation

  • kmalloc/kfree
  • vmalloc

4. Kernel API

4.1. memory management

copy_from_user: may block due to page fault copy_to_user

5. POSIX

This section describes memory related API in user space

5.1. Shared Memory

API (mmap, munmap (2)) mmap can select whether memory is private (MAP_PRIVATE) or shared (MAP_SHARED)

  • mmap can map file or map anonymous memory (an option for memory allocation).
  • offset and addr should be page aligned in linux, length will be rounded up to a multiple of page size (BTW, page size can be retrieved by getpagesize(2))

API (mprotect, madvise, mlock, msync (2))

  • mprotect: change protection (PROT_READ, PROT_WRITE, PROT_EXEC) of a region
  • madvise: tell OS the expectedd read pattern to make good guess. (e.g: random or sequence)
  • mlock: lock the region to prevent from being swapped out.
  • msync: force memory to be written into file (sync or async)

5.2. Memory Allocation

API (brk, sbrk (2)) heap allocation system call, it change current program break.

  • brk specify the new program break address. brk(0) returns current program break
  • sbrk specify increments

API (malloc, free (3)) memory allocation can be implemented with brk, sbrk or mmap.

free memory blocks are managed (as a linked list) in user space to reduce syscalls

malloc first search empty blocks in current memory lists. If found, return the block and mark as used. If not found, call brk or sbrk to allocate new memory. This is to prevent issuing system calls. free will return the block to managed memory lists (in user space) without calling brk or sbrk.

  • first fit strategy: implementation used K&R amd malloc in embedded systems. find the first block whose size is enough. The problem is memory fragmentation.
  • best fit strategy: glibc malloc, derived from dlmalloc implementation.

API (calloc, realloc (3))

  • calloc: malloc with initialization
  • realloc: can be used in vector, map implementations

API (memalign, posix_memalign (3)) allocate memory with a specific alignment. useful for SSE, AVX...

API (alloca (3)) allocate memory on stack

6. Reference

[1] Kerrisk, Michael. The Linux programming interface: a Linux and UNIX system programming handbook. No Starch Press, 2010.