Reason to use speaker encoder over speaker embeddings? #20

dunky11 · 2021-07-29T08:43:05Z

What was the reason you switched from speaker embeddings (Cotatron) to a speaker encoder (this). Was it because it worked better? Or was it to support Any to Any voice conversion? I'm curious because I am currently trying to deploy my own architecture and can't really decide between the two.

wookladin · 2021-08-02T09:20:10Z

Hi, we used a speaker encoder over speaker embedding because speaker embedding can't capture the variation of speech within the same speaker. These variations may include the recording environment, the speaker's prosody, etc.

However, we have not performed extensive ablation studies of the benefits of speaker encoder over speaker embedding. So, we're not sure of the exact performance improvement we gain from using a speaker encoder.

dunky11 · 2021-08-08T09:32:30Z

Thank you very much, that cleared it up. In my experiments using an encoder worked better than using embeddings too.

dunky11 changed the title ~~Reason to use Speaker Encoder over Speaker Embeddings?~~ Reason to use speaker encoder over speaker embeddings? Jul 29, 2021

dunky11 closed this as completed Aug 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reason to use speaker encoder over speaker embeddings? #20

Reason to use speaker encoder over speaker embeddings? #20

dunky11 commented Jul 29, 2021 •

edited

Loading

wookladin commented Aug 2, 2021

dunky11 commented Aug 8, 2021 •

edited

Loading

Reason to use speaker encoder over speaker embeddings? #20

Reason to use speaker encoder over speaker embeddings? #20

Comments

dunky11 commented Jul 29, 2021 • edited Loading

wookladin commented Aug 2, 2021

dunky11 commented Aug 8, 2021 • edited Loading

dunky11 commented Jul 29, 2021 •

edited

Loading

dunky11 commented Aug 8, 2021 •

edited

Loading