0x221 Computer Architecture

Computer Architecture (or ISA) is an abstract interface between the hardware and the lowest level software.

History

This section is my summary of processors’ history from 1950.

1950 – 1960

Traitorous eight left Shockley’s lab on 1957, found Fairchild semiconductor

1957 development of planar process by Jean Hoerni. 1958, invention of IC (hybrid IC on germanium) by Jack Kilby (Texas Instrument), improved (monolithic IC on silcon) by Robert Noyce (Fairchild).

1960 – 1970 (3rd Generation: IC)

1968, Noyce, Gordon left Fairchild and found Intel

1970 – 1980 (4th Gen: microprocessor 10 micro – 1 micro)

2000 – 2010 (50nm – 100 nm)

Intel released Pentium 4 series, whose strategy then was to improve clock frequency in a single core, which causes much higher power consumption and heat due to leakage. On the other hand, AMD introduced Athlon 64 to shift to 64 bit and focused on efficiency in each clock cycle successfully. To compete with AMD, Intel switched to Core 2 series whose sales was better than AMD Phenom series. (Interestingly, AMD stock price dropped significantly after Core 2 release)

2010 – Now (10nm -50 nm)

Intel Core series started tick-tock strategy to improve the manufacturing process and microarchitecture alternatively each year until Skylake. (Nehalem/Westmere, Sandy Bridge/Ivy Bridge, Haswell/Broadwell, Skylake)

AMD acquired ATI in 2006 and improved its graphic performance. On GPU side, AMD Radeon 4000 compete with NVIDIA Geforce 200 successfully, but on CPU side, AMD announced Bulldozer (FX series) to compete with Intel Core series. The CPU side was not successful until Zen series.

Basic

Instruction Processing Style

  • 0-address: stack machine (op, push A, pop A)
  • 1-address: accumulator machine (op ACC, ld A, st A)
  • 2-address: 2-operand machine (op S,D; one is both source and dest)
  • 3-address: 3-operand machine (op S1, S2, D; source and dest separate)

Stack machine

Advantages are the small instruction size and efficient procedure calls (all params are on stack so no additional cycles for parameter passing)

Downsides are computations that are not easily expressible with postfix notation are difficult to map to stack machines

Not that stack machine is a class of pushdown automaton, therefore not Turing complete. If it has two stacks, It is equivalent to Turing machine.

Data Types

Representation of information for which there are instructions that operator on the representation

Example: integer (two endians), float point, char, binary . Some rare examples are queue, doubly linked lists, and even objected oriented (intel 432) !

Memory Organization

Address space: How many uniquely identifiable locations in memory

Addressability: how much data does each uniquely identifiable location store

  • bit addressable: Burroughs B1700 (purpose of this was to virtualize ISA)
  • byte-addressable: most ISA
  • 64-bit addressable: some supercomputer
  • 32-bit addressable: first Alpha

Support for virtual memory

Registers More vs Less

benefit or more registers: enable better register allocation by compiler

benefit of fewer registers: save number of bits for encoding register address, small register file.

Addressing Mode

Addressing modes specify how to obtain the operands

  • Absolute: use the immediate value as address (e.g: LW rt, 10000)
  • Register Indirect: use GRP[r_base] as address (e.g: LW rt, (r_base)
  • Displaced or based: use offset + GPR[r_base] as address (LW rt, offset(r_base))
  • Indexed use: GPR[r_base] + GPR[r_index] as address (LW rt, (r_base, r_index))
  • Memory Indirect: use value at M[GPR[r_base]] as address (LW rt((r_base))
  • Auto inc/decrement: use GRP[r_base] as address but inc or dec (LW rt, (r_base))

More addressing modes

  • The good point is that it enables better mapping of high-level constructs to the machine such as array, pointer-based accesses. (e.g: array access can be implemented with auto inc).
  • The downside is hard to design and too many choices for compiler.

RISC vs CISC

  • RISC: simple instruction, fixed length, uniform decode, few addressing mode
  • CISC: complex instructions, variable length, non-uniform decode, many addressing mode

DEC family

PDP

VAX

Alpha

Intel family

Reference: Intel Manual

History

  • 4004(4bit)
  • 8080(8bit)
  • 8086(16bit)
  • 386(32bit)
  • 486
  • pentium
  • intel64(64bit)

General Purpose Register

  • rax, rbx, rcx, rdx … r8 … r15

Segmentation Register

I found that they are rarely used in recent desktop OS because segmentation has been replaced by paging, windbg and lldb shows that those registers are constantly zeros on Windows 10 and Linux 4.1

  • cs, ds …
  • cs: code
  • ds: data
  • es: extra
  • fs: general purpose
  • gs: general purpose
  • ss: stack segment

Control Register

CR0

  • PE (0 bit): protected mode enabled
  • PG (31 bit): paging unit (CR3) enabled

CR3

  • page table based register

SIMD Register

  • xmm0 … xmm7 (each has 128bit for SSE)

Arithmetic Instruction

  • expensive arithmetic instruction such as multiply and division are only available at rax

SIMD Instruction

  • MMX, SSE, SSE2, SSE, AVX, AVX512

System call

  • sysenter: fast level 3 to level 0 rountine. stack need to store some registers before entry

MIPS

Instruction Format

R-type

I-type

J-type

ARM family

License

Arm’s license looks interesting. Basically it runs two types of license

  • Architecture license just license their ISA
  • Cortex license is about their microarchitecture and of course ISA

A32 Architecture

ARMv3 -> …. -> ARMv7 -> ARMv8 (support for 64bit)

Current processors are named in the format of Cortex-(A|R|M)[0-9]+ where A is for server, R for realtime system, M denotes microcontroller.

A32 Registers

  • R0 – R12: general purpose
  • SP (R13): stack pointer
  • LR (R14) link register
  • PC (R15) program counter

A32 Assembly

Thumb Assembly

A64 Architecture

A64 Registers

  • X0-X30: 64bit general purpose
  • W0-W30: 32bit general purpose
  • ZXR, WZR: zero register
  • LR(X30): link register
  • SP: stack pointer
  • PC: program counter

A64 Assembly

svc for system call

RISC-V

Nvidia family

PTX

Reference

[1] CMU 15418 http://www.cs.cmu.edu/~418/

[2] Hennessy, John L., and David A. Patterson. Computer architecture: a quantitative approach. Elsevier, 2011.

[3] Patterson, David A., and John L. Hennessy. Computer Organization and Design ARM Edition: The Hardware Software Interface. Morgan kaufmann, 2016.

[4] arm developer document https://developer.arm.com/docs