-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
synthetic audio for nts data created by gTTS library #30
Conversation
…be used as functions as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great! My comments are mostly some stylistic changes, as well as potentially converting some of the hardcoded values into arguments.
When this PR has been merged, we also need a PR which adds the following:
- Saving the dataset as a Hugging Face Dataset.
- Including a dataset config file which uses the dataset stored on Hugging Face Hub in the training script.
src/scripts/build_synthetic_nts.py
Outdated
""" | ||
subprocess.run(["say", text, "-o", filename]) | ||
|
||
def generate_speech_eSpeak(text, filename, variant="+m1"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing type hints
I see that there are also failures with the linting and unit testing that needs to be fixed here. Are you using the pre-commit hooks? |
fine. Co-authored-by: Dan Saattrup Nielsen <[email protected]>
Co-authored-by: Dan Saattrup Nielsen <[email protected]>
Co-authored-by: Dan Saattrup Nielsen <[email protected]>
This creates synthetic data from gTTS voice, the opensource library from Google with mit license.
It creates audio with the file name in the nts dataset and saves it in a train folder
Several other voices can be used to create data with this script, based on functions in the code (license should be investigated for these packages as e.g. espeak (needs to be installed, a terminal program) or mac build in voice with Siri Voices (needs to be a mac, obviously, a terminal program).
Issues, the folder structure probably wrong, some packages as click are not used in this code.
Should this code support code that are already made to create the nts dataset, which is not added now? Huggingface format is not implemented here.