Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about GTA of mel-spectograms #10

Closed
dunky11 opened this issue Jul 5, 2021 · 2 comments
Closed

Question about GTA of mel-spectograms #10

dunky11 opened this issue Jul 5, 2021 · 2 comments

Comments

@dunky11
Copy link

dunky11 commented Jul 5, 2021

Im trying to understand the GTA part of your paper which seems to have a huge influence and im unsure if I understood it correctly. I understood that much: You have two networks, one which maps a source and target speaker mel spectogram and the transcription to a transformed spectogram and the vocoder which maps the transformed spectogram to waveform.

You first train the first network. Then instead of transforming the waveform to a spectogram and using that as input to the vocoder, in order to train it, you pass the audio through your proposed network and use the output as input to train the vocoder, is that correct?

@wookladin
Copy link
Contributor

Hi.
Yes, as you understand, we put the same mel spectrogram as the source and the target in the first network, which will reconstruct the original mel since it is the same situation as the training.
And then we use the reconstructed mel spectrogram as the input to train the vocoder.
Thank you :)

@dunky11
Copy link
Author

dunky11 commented Jul 5, 2021

Thanks :) Sounds complicated but is actually quite simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants