Skip to content

0x553 Task

1. Information Retrieval

Classical lexical/statistical IR methods are summarized in the search engine note

1.1. Dense Retrieval

Model (ME-BERT) Multi-Vector Encoding

Model (ColBERT) late interaction

colbert

1.2. Reranking

1.3. Generation

Model (promptagator) leverages LLM as a few-shot query generator, and creates task-specific retrievers based on the generated data

Model (WebGPT) allows GPT to search/navigate the web

  • Behavior cloning
  • Reward Modeling
  • Reinforcement learning
  • Rejection sampling

example

2. Translation

Model (Google translation 2017)

Model (knowledge distillation) Distill knowledge from multiple teacher (trained with each lang-pair data separately) to a single multilingual student. The loss is both NLL loss and distillation loss (cross entropy of student/teacher distribution)

Model (massively multilingual model)

Transfer vs Interference:

the goal is to achieve

  • high transfer (positive transfer) to low-resource languages
  • low interference (negative transfer) for high-resource languages.

Sampling strategy:

  • original language distribution has low-transfer/low-interference
  • equal sampling (by upsampling low-resource lang) has high-transfer/high-interference
  • this work suggests using a temperature-based sampling has a good balance over transfer/inteference. Sampling prob is \(p_l^{1/T}\) where \(p_l\) is the original distribution and \(T=5\)