Efficiency
Title | Year | Author | Link | Memo |
---|---|---|---|---|
Learning both Weights and Connections for Efficient Neural Networks | 2015 | Song Han et al. | one of the original pruning paper | |
THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS | 2019 ICLR | Jonathan Frankle et al. | reset unpruned parameter to its initialization |
Todo
Pruning
- Block Pruning For Faster Transformers
- Structured Pruning of Large Language Models
Quantization
- GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS
- AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
- 8-BIT OPTIMIZERS VIA BLOCK-WISE QUANTIZATION