Skip to content

0x331 Linker

This section is for linker and object files

1. File Format

The first edition of UNIX was using a.out format on PDP machines.

1.1. COFF

COFF (Common Object File Format) is an old format replacing a.out format. It has been largely replaced by ELF on Unix-like systems and by PE on Windows systems.

1.2. ELF

  • _start is known to the linker ld (in Linux) as the default entrypoint symbol (another symbol can be used) and is not called.
// This is an example to show how _start works
// The following program can be compiled by gcc -c program.c && ld program.o
// gcc program.c will fail because _start is automatically created by gcc when creating executable elf
// executing this binary, and echo $? will return 23.

extern "C" void _start(){
    asm("mov $60, %eax\n"      // syscall 60 on x86-64 is sys_exit
        "mov $23, %edi\n"      // return val
        "syscall\n");
}
  • main or main (osx) or main (OpenWatcom) is known to the C language (glibc), and is called by "startup code" which is "usually" linked to.
  • following is the startup code generated by assembler (objdump -d)
// int main() { return 23; }

00000000004004f0 <_start>:
  4004f0:   31 ed                   xor    %ebp,%ebp
  4004f2:   49 89 d1                mov    %rdx,%r9
  4004f5:   5e                      pop    %rsi
  4004f6:   48 89 e2                mov    %rsp,%rdx
  4004f9:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  4004fd:   50                      push   %rax
  4004fe:   54                      push   %rsp
  4004ff:   49 c7 c0 80 06 40 00    mov    $0x400680,%r8
  400506:   48 c7 c1 10 06 40 00    mov    $0x400610,%rcx
  40050d:   48 c7 c7 f8 05 40 00    mov    $0x4005f8,%rdi // this param is the actual enter point to main
  400514:   e8 c7 ff ff ff          callq  4004e0 <__libc_start_main@plt>
  400519:   f4                      hlt
  40051a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

00000000004005f8 <main>:
  4005f8:   55                      push   %rbp
  4005f9:   48 89 e5                mov    %rsp,%rbp
  4005fc:   b8 17 00 00 00          mov    $0x17,%eax
  400601:   5d                      pop    %rbp
  400602:   c3                      retq
  400603:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40060a:   00 00 00
  40060d:   0f 1f 00                nopl   (%rax)

1.2.1. Sections

sections are used at link time

  • data (D/d) : definition of initialized global variables
  • code (text) (T/t) : definition of functions
  • bss (B/b): definition of uninitialized global variables

1.2.2. Segments

segments are used at runtime

1.3. PE

1.3.1. COFF header

2. Linker

Concept (definition vs declaration)

  • definition: association of a name with its implementation
  • declaration: a definition of the name exists somewhere in the program (e.g: extern keyword in C)

Concept (global vs local vs static)

  • global: global existence (exists for the whole lifetime of the program) + global visibility (accessible everywhere)
  • local: local existence + local visibility
  • static: global existence + local visibility

2.1. Name Mangling

C: no name mangling

C++: name get mangled with info of class name, argument, return value. use extern keyword to link with C compiled objects

3. Loader

4. Linux

4.1. SO

shared library

Commands:

  • nm: print symbols
  • lsof -p : check linked shared libraries

5. Windows

5.1. DLL

windows, watch this Youtube tutorial

link.exe from MSVC combines object files (obj) into executable (exe). See reference here

Command:

  • dumpbin: print symbols

APIs:

  • loadlibraryexw

6. OSX

6.1. dylib

osx

7. Reference

[1] Introduction to linker