Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multiple experts per moe layer #4291

Conversation

jonhilgart22
Copy link

Add the ability to use multiple types of networks per MoE layer instead of only one network.

@awan-10 awan-10 self-assigned this Sep 8, 2023
@awan-10
Copy link
Contributor

awan-10 commented Sep 8, 2023

@jonhilgart22 - thanks for the PR! I do not fully understand the need for this PR without looking at the client-side code. Do you have any example code to explain the usage of this new feature?

@jonhilgart22
Copy link
Author

@jonhilgart22 - thanks for the PR! I do not fully understand the need for this PR without looking at the client-side code. Do you have any example code to explain the usage of this new feature?

My intention is to attempt to train MoE with existing fine-tuned models as the experts. So, instead of having many-to-one experts to network types (e.g. 10 experts of MLP layers) you could instead train differing experts per layer ( t5 as one and gpt2 as another).

@awan-10
Copy link
Contributor

awan-10 commented Sep 11, 2023

@jonhilgart22. Sounds good. Can you please either modify the existing MoE unit test or add one so we know that this does not introduce any issues for the standard MoE API?

@awan-10
Copy link
Contributor

awan-10 commented Sep 12, 2023

@jonhilgart22 - please see that failing MoE unit tests. I think that is an indication that something is broken with the new changes.

@jonhilgart22
Copy link
Author

closing as this is not exactly what I'm looking for. BTM is closer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants