Skip to content

0x504 Multitask Learning

Multitask Learning

Sampling Strategy

When training over a set of imbalanced datasets, there are a few strategies:

  • equal mixing: baseline, typically overfitting to low-resouce task and underfit high-resource tasks
  • examples-proportional sampling: proportional but with an upper bound K, T5 paper says there a sweet spot \(K\) for each task achieving the best
  • temperature based sampling: as adopted by Arivazhagan, Naveen, et al 2019 where an appropriate temperature (e.g T=5) sets a good balance between high-resource tasks and low-resource tasks (transfer vs interference problems)

Transfer Learning

Continual Learning

From Hong-yi Lee's video

Regularization

Elastic Weight Consolidation (EWC) • https://arxiv.org/abs/1612.00796

Synaptic Intelligence (SI) • https://arxiv.org/abs/1703.04200

Memory Aware Synapses (MAS) • https://arxiv.org/abs/1711.09601

RWalk • https://arxiv.org/abs/1801.10112

Sliced Cramer Preservation (SCP) • https://openreview.net/forum?id=BJge3TNKwH

Gradient Episodic Memory

Memory-Replay

Neural Resource Allocation

Curriculum Learning

reference gemax WER: 4.78 WER precompute token + wiz server: 5.05 WER export tokenizer + wiz server: 5.24 WER