Skip to content

Efficiency

Title Year Author Link Memo
Learning both Weights and Connections for Efficient Neural Networks 2015 Song Han et al. pdf one of the original pruning paper
THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS 2019 ICLR Jonathan Frankle et al. pdf reset unpruned parameter to its initialization

Todo

Pruning

  • Block Pruning For Faster Transformers
  • Structured Pruning of Large Language Models

Quantization

  • GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS
  • AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
  • 8-BIT OPTIMIZERS VIA BLOCK-WISE QUANTIZATION