Skip to content

ASR

Robustness / Generalization

Title Year Author Link Memo
TOWARD DOMAIN-INVARIANT SPEECH RECOGNITION VIA LARGE SCALE TRAINING 2018 Arun Narayanan et al. pdf a single domain-invariant model for varied use-cases from multiple domains
RETHINKING EVALUATION IN ASR: ARE OUR MODELS ROBUST ENOUGH? 2021 Tatiana Likhomanenko et al. pdf evaluation across multiple test sets is necessary (at least tedlium)

Efficiency

Title Year Author Link Memo
EFFICIENT KNOWLEDGE DISTILLATION FOR RNN-TRANSDUCER MODELS 2020 Sankaran Panchapagesan et al pdf use only target label and blank label for distill rnnt

Rescoring

Title Year Author Link Memo
CROSS-UTTERANCE ASR RESCORING WITH GRAPH-BASED LABEL PROPAGATION 2023 ICASSP Srinath Tankasala et al. pdf rescore a set of utterances together using label propagation over acoustic similarity graph

Multitask

Title Year Author Link Memo
SPEECH REPRESENTATION LEARNING THROUGH SELF-SUPERVISED PRETRAINING AND MULTI-TASK FINETUNING 2021 Yi-Chen Chen et al pdf multitasking on top of SSL

Multilingual

Title Year Author Link Memo
HIERARCHICAL SOFTMAX FOR END-TO-END LOW-RESOURCE MULTILINGUAL
SPEECH RECOGNITION icassp 2023 Qianying Liu et al pdf applied hierarchical softmax to cross lingual tokens