Skip to content

0x504 Multitask Learning

Multitask Learning

Sampling Strategy

When training over a set of imbalanced datasets, there are a few strategies:

  • equal mixing: baseline, typically overfitting to low-resouce task and underfit high-resource tasks
  • examples-proportional sampling: proportional but with an upper bound K, T5 paper says there a sweet spot \(K\) for each task achieving the best
  • temperature based sampling: as adopted by Arivazhagan, Naveen, et al 2019 where an appropriate temperature (e.g T=5) sets a good balance between high-resource tasks and low-resource tasks (transfer vs interference problems)

Transfer Learning

Continual Learning

From Hong-yi Lee's video


Elastic Weight Consolidation (EWC) •

Synaptic Intelligence (SI) •

Memory Aware Synapses (MAS) •

RWalk •

Sliced Cramer Preservation (SCP) •

Gradient Episodic Memory


Neural Resource Allocation

Curriculum Learning

reference gemax WER: 4.78 WER precompute token + wiz server: 5.05 WER export tokenizer + wiz server: 5.24 WER