ASR
Robustness / Generalization
Title | Year | Author | Link | Memo |
---|---|---|---|---|
TOWARD DOMAIN-INVARIANT SPEECH RECOGNITION VIA LARGE SCALE TRAINING | 2018 | Arun Narayanan et al. | a single domain-invariant model for varied use-cases from multiple domains | |
RETHINKING EVALUATION IN ASR: ARE OUR MODELS ROBUST ENOUGH? | 2021 | Tatiana Likhomanenko et al. | evaluation across multiple test sets is necessary (at least tedlium) |
Efficiency
Title | Year | Author | Link | Memo |
---|---|---|---|---|
EFFICIENT KNOWLEDGE DISTILLATION FOR RNN-TRANSDUCER MODELS | 2020 | Sankaran Panchapagesan et al | use only target label and blank label for distill rnnt |
Rescoring
Title | Year | Author | Link | Memo |
---|---|---|---|---|
CROSS-UTTERANCE ASR RESCORING WITH GRAPH-BASED LABEL PROPAGATION | 2023 ICASSP | Srinath Tankasala et al. | rescore a set of utterances together using label propagation over acoustic similarity graph |
Multitask
Title | Year | Author | Link | Memo |
---|---|---|---|---|
SPEECH REPRESENTATION LEARNING THROUGH SELF-SUPERVISED PRETRAINING AND MULTI-TASK FINETUNING | 2021 | Yi-Chen Chen et al | multitasking on top of SSL |
Multilingual
Title | Year | Author | Link | Memo |
---|---|---|---|---|
HIERARCHICAL SOFTMAX FOR END-TO-END LOW-RESOURCE MULTILINGUAL | ||||
SPEECH RECOGNITION | icassp 2023 | Qianying Liu et al | applied hierarchical softmax to cross lingual tokens |