0x504 Multitask Learning
Multitask Learning
Sampling Strategy
When training over a set of imbalanced datasets, there are a few strategies:
- equal mixing: baseline, typically overfitting to low-resouce task and underfit high-resource tasks
- examples-proportional sampling: proportional but with an upper bound K, T5 paper says there a sweet spot \(K\) for each task achieving the best
- temperature based sampling: as adopted by Arivazhagan, Naveen, et al 2019 where an appropriate temperature (e.g T=5) sets a good balance between high-resource tasks and low-resource tasks (transfer vs interference problems)
Transfer Learning
Continual Learning
From Hong-yi Lee's video
Regularization
Elastic Weight Consolidation (EWC) • https://arxiv.org/abs/1612.00796
Synaptic Intelligence (SI) • https://arxiv.org/abs/1703.04200
Memory Aware Synapses (MAS) • https://arxiv.org/abs/1711.09601
RWalk • https://arxiv.org/abs/1801.10112
Sliced Cramer Preservation (SCP) • https://openreview.net/forum?id=BJge3TNKwH
Gradient Episodic Memory
Memory-Replay
Neural Resource Allocation
Curriculum Learning
reference gemax WER: 4.78 WER precompute token + wiz server: 5.05 WER export tokenizer + wiz server: 5.24 WER