synthetic audio for nts data created by gTTS library #30

emilhovad · 2023-08-23T09:10:29Z

This creates synthetic data from gTTS voice, the opensource library from Google with mit license.
It creates audio with the file name in the nts dataset and saves it in a train folder

Several other voices can be used to create data with this script, based on functions in the code (license should be investigated for these packages as e.g. espeak (needs to be installed, a terminal program) or mac build in voice with Siri Voices (needs to be a mac, obviously, a terminal program).

Issues, the folder structure probably wrong, some packages as click are not used in this code.
Should this code support code that are already made to create the nts dataset, which is not added now? Huggingface format is not implemented here.

…be used as functions as well).

saattrupdan

Overall looks great! My comments are mostly some stylistic changes, as well as potentially converting some of the hardcoded values into arguments.

When this PR has been merged, we also need a PR which adds the following:

Saving the dataset as a Hugging Face Dataset.
Including a dataset config file which uses the dataset stored on Hugging Face Hub in the training script.

src/scripts/build_synthetic_nts.py

saattrupdan · 2023-08-24T08:27:51Z

src/scripts/build_synthetic_nts.py

+    """
+    subprocess.run(["say", text, "-o", filename])
+
+def generate_speech_eSpeak(text, filename, variant="+m1"):


Missing type hints

src/scripts/build_synthetic_nts.py

saattrupdan · 2023-08-24T08:37:49Z

I see that there are also failures with the linting and unit testing that needs to be fixed here. Are you using the pre-commit hooks?

fine. Co-authored-by: Dan Saattrup Nielsen <[email protected]>

Co-authored-by: Dan Saattrup Nielsen <[email protected]>

synthetic audio, nts data created by gTTS libary (other libaries can …

2c1f0ef

…be used as functions as well).

emilhovad requested review from AJDERS and saattrupdan August 23, 2023 09:10

saattrupdan requested changes Aug 24, 2023

View reviewed changes

emilhovad and others added 5 commits August 24, 2023 13:26

Update src/scripts/build_synthetic_nts.py

dc1b859

fine. Co-authored-by: Dan Saattrup Nielsen <[email protected]>

Update src/scripts/build_synthetic_nts.py

5cf692a

Co-authored-by: Dan Saattrup Nielsen <[email protected]>

Update src/scripts/build_synthetic_nts.py

e1a003f

Co-authored-by: Dan Saattrup Nielsen <[email protected]>

all Dan's comments should be fixed.

bf358bf

all Dan's comments should be fixed.

b93029c

saattrupdan assigned emilhovad Aug 29, 2023

saattrupdan closed this Oct 4, 2023

saattrupdan deleted the synthetic_data branch July 1, 2024 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synthetic audio for nts data created by gTTS library #30

synthetic audio for nts data created by gTTS library #30

emilhovad commented Aug 23, 2023

saattrupdan left a comment

saattrupdan Aug 24, 2023

saattrupdan commented Aug 24, 2023

synthetic audio for nts data created by gTTS library #30

synthetic audio for nts data created by gTTS library #30

Conversation

emilhovad commented Aug 23, 2023

saattrupdan left a comment

Choose a reason for hiding this comment

saattrupdan Aug 24, 2023

Choose a reason for hiding this comment

saattrupdan commented Aug 24, 2023