Computer Microarchitecture is the implementation of the ISA under specific design constraints and goals, it is an abstraction layer between the logic and architecture levels
- 4 bit: intel 4004 (first intel chip, 1970, 2k transistor)
- 8 bit: intel 8008 (1972, 3k transistors)
- 16 bit: intel 8086 (1976), PDP-11 (minicomputer DEC 1970)
- 32 bit: intel 80386 (1985), VAX-11 (DEC 1977)
- 64 bit
superscalar processor The superscalar architecture implements parallelism within one core executing independent part of instruction from the same instruction stream. This was one of the main strategies in the pre-multi core era, but requires a lot transistors for cache, branch predictor, out-of-order logics. P5 Pentium was the first x86 superscalar processor.
The following example shows a single-core architecture which can execute two independent instructions simultaneously from a single instruction stream.
instruction stream coherence: same instruction sequence applied to all elements, which is necessary for efficient SIMD execution, but not necessary for multicore parallelization
SSE instructions: 128 bit (4 wide float)
AVX instructions: 256 bits (8 wide float)
hyper-threading: super-scalar with multiple execution contexts in a single core
multi-core: thread-level parallelism. simultaneously execute a completely different instruction stream on each core
security: meltdown and spectre
Single-cycle implementation: an instruction is executed in one clock cycle, the slowest instruction decide cycle time
Multi-cycle implementation: instruction processing broken into multiple cycles/stages.
Cache is usually implemented with SRAM
- L1: reference 1ns, usually in core
- L2: reference 4ns, usually out
- L3: usually shared by multiple cores
- full associative cache: each memory can be placed anywhere
- directed mapped cache: each memory can be placed at one place
- LRU: Least Recently Used
- Write-through: write data to cache and RAM at the same time
- Write-back: delay writing data to RAM
the unit to translate virtual address into physical address
- the cache that saves the recent address mapping
- it is a cache of page tables
- only store the final translation even it is a multiple-level memory
- change cr3 in x86 can clear TLB automatically
 Patterson, David A., and John L. Hennessy. Computer Organization and Design ARM Edition: The Hardware Software Interface. Morgan kaufmann, 2016.
 Hennessy, John L., and David A. Patterson. Computer architecture: a quantitative approach. Elsevier, 2011.
 CMU 15-418/15-618: Parallel Computer Architecture and Programming
 CMU 18-447 Introduction to Computer Architecture