Software
I created and maintain the following online/offline softwares, most of them are related to my research.
Models
- Allosaurus: Allosaurus is a pretrained universal phone recognizer for more than 2000 languages. It contains several acoustic models we published:
- Universal Model described in our ICASSP 2020 paper
- Compositional Phonetics Model described in our Interspeech 2021 paper
Online Applications
I have a website containing several applications related to low resource speech processings. The tools available are as follows:
Corpus Collection
You can create a kaldi-like corpus dataset from a single text file or a single audio file. These applications were used when we participated in the LoReHLT evaluation. Some features of the applications are summarized in our Interspeech 2020 demo paper
-
Recording Application: You can upload an text file which you want to create a corpus from. It will generate an interface for you to record audio for each sentence.
-
Transcription Application You can upload an audio file(s) which you want to create a corpus from. It will generate an interface for you to listen to each audio clip to transcribe.
Speech Recognition
- Online Allosaurus: This is an old version of the Allosaurus model. You can upload a audio file to test its recognition online. A CUI interface is also available to query the online model
- Inventory Customization: This is a online tool to create phone inventory to customize Allosaurus model.
Speech Synthesis
- Low resource TTS: This is a demo to test parametric-based HMM models for many low resource languages. Those models were trained using the Wilderness corpus by Alan W. Black.
Datasets
- UCLA Phonetics Corpus: Dataset of phone annotated 97 low resource languages, it is described in our ICASSP 2021 paper MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION
Others
- kaldi-cmake: Create CMakeLists.txt automatically for kaldi project.
- pytensor: A toy numpy based deep learning framework.