-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training ASR models in multiple languages #91
Comments
This is the great idea 😆 |
Hi @usimarit! Thanks for such a great project and your support to boost visibility of this issue 😻 So I will start by writing a helper script to automatically download MLS dataset in a given language and prepare the transcription and alphabet files and PR to add it to the repo. Then I can train a Conformer model in German by using this script as a first step 🚀 |
Hi @monatis, just for your information, we should train using |
Hi @usimarit, yeah I know that training with subwords yields a better performance, but I'm automatically generating an alphabet file in #92 for those who want to use characters anyway. |
@usimarit Grat. I've been quite busy for some time, but I'll be more active in this repo on following days and contribute pretrained models in other languages. Thanks |
@monatis, did you already train a Conformer model in German? |
Hi @JStumpp I started to train it and hope to release it next week. |
Do you know in which subset of Librispeech is the english pretrained model trained on? |
TensorFlowASR
makes it quite easy to train and deploy almost SOTA ASR models, but it provides a pretrained model only in English. On the other hand, FAIR has recently published an open and free dataset in 8 languages (see the paper). It is in the public domain and of a large size, and has the same quality as LibriSpeech. So, my suggestion is form a volunteer working group to collaborate on training ASR models in multiple languages and share them publically.Maintainers of the repo can pin the issue and label it with
help-wanted
for visibility if this idea makes sense.The text was updated successfully, but these errors were encountered: