Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason to use speaker encoder over speaker embeddings? #20

Closed
dunky11 opened this issue Jul 29, 2021 · 2 comments
Closed

Reason to use speaker encoder over speaker embeddings? #20

dunky11 opened this issue Jul 29, 2021 · 2 comments

Comments

@dunky11
Copy link

dunky11 commented Jul 29, 2021

What was the reason you switched from speaker embeddings (Cotatron) to a speaker encoder (this). Was it because it worked better? Or was it to support Any to Any voice conversion? I'm curious because I am currently trying to deploy my own architecture and can't really decide between the two.

@dunky11 dunky11 changed the title Reason to use Speaker Encoder over Speaker Embeddings? Reason to use speaker encoder over speaker embeddings? Jul 29, 2021
@wookladin
Copy link
Contributor

Hi, we used a speaker encoder over speaker embedding because speaker embedding can't capture the variation of speech within the same speaker. These variations may include the recording environment, the speaker's prosody, etc.

However, we have not performed extensive ablation studies of the benefits of speaker encoder over speaker embedding. So, we're not sure of the exact performance improvement we gain from using a speaker encoder.

@dunky11
Copy link
Author

dunky11 commented Aug 8, 2021

Thank you very much, that cleared it up. In my experiments using an encoder worked better than using embeddings too.

@dunky11 dunky11 closed this as completed Aug 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants