Add support for Multilingual LibriSpeech dataset #92

monatis · 2020-12-31T03:46:09Z

I added a script to download and prepare transcripts for a given language in MLS dataset.
Example usage:

python ./scripts/create_mls_dataset.py --help
usage: create_mls_trans.py [-h] [--dataset-home DATASET_HOME] --language
                           {dutch,english,german,french,italian,portuguese,polish,spanish}
                           [--opus]

Download and prepare MLS dataset in a given language

optional arguments:
  -h, --help            show this help message and exit
  --dataset-home DATASET_HOME, -d DATASET_HOME
                        Path to home directory to download and prepare
                        dataset. Default to ~/.keras
  --language {dutch,english,german,french,italian,portuguese,polish,spanish}, -l {dutch,english,german,french,italian,portuguese,polish,spanish}
                        Any name of language included in MLS
  --opus                Whether to use dataset in opus format or not

nglehuy · 2020-12-31T09:40:23Z

Thanks, @monatis
The blank in CharacterFeaturizer is one of 0 or num_classes - 1 and it isn't retrieved from the file so you don't need to add extra \n when creating the character file.

monatis · 2020-12-31T10:03:15Z

Thanks for the info, fixed it. I was just confused by other implementations.

Add support for Multilingual LibriSpeech dataset

43edfda

Remove blank char from auto-generated alphabet

d2c890a

monatis mentioned this pull request Dec 31, 2020

Training ASR models in multiple languages #91

Open

nglehuy merged commit b131a7c into TensorSpeech:main Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Multilingual LibriSpeech dataset #92

Add support for Multilingual LibriSpeech dataset #92

monatis commented Dec 31, 2020

nglehuy commented Dec 31, 2020

monatis commented Dec 31, 2020

Add support for Multilingual LibriSpeech dataset #92

Add support for Multilingual LibriSpeech dataset #92

Conversation

monatis commented Dec 31, 2020

nglehuy commented Dec 31, 2020

monatis commented Dec 31, 2020