Question about GTA of mel-spectograms #10

dunky11 · 2021-07-05T11:26:38Z

Im trying to understand the GTA part of your paper which seems to have a huge influence and im unsure if I understood it correctly. I understood that much: You have two networks, one which maps a source and target speaker mel spectogram and the transcription to a transformed spectogram and the vocoder which maps the transformed spectogram to waveform.

You first train the first network. Then instead of transforming the waveform to a spectogram and using that as input to the vocoder, in order to train it, you pass the audio through your proposed network and use the output as input to train the vocoder, is that correct?

wookladin · 2021-07-05T11:45:03Z

Hi.
Yes, as you understand, we put the same mel spectrogram as the source and the target in the first network, which will reconstruct the original mel since it is the same situation as the training.
And then we use the reconstructed mel spectrogram as the input to train the vocoder.
Thank you :)

dunky11 · 2021-07-05T11:51:10Z

Thanks :) Sounds complicated but is actually quite simple.

dunky11 closed this as completed Jul 5, 2021

Vadim2S mentioned this issue Jul 26, 2021

Possible bottleneck? #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about GTA of mel-spectograms #10

Question about GTA of mel-spectograms #10

dunky11 commented Jul 5, 2021

wookladin commented Jul 5, 2021

dunky11 commented Jul 5, 2021

Question about GTA of mel-spectograms #10

Question about GTA of mel-spectograms #10

Comments

dunky11 commented Jul 5, 2021

wookladin commented Jul 5, 2021

dunky11 commented Jul 5, 2021