-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reflect use_parallel_residual in mlp_after_attn for module_inject #2446
Conversation
Hi @twaka - apologies for not getting to review this sooner - is this a PR that still makes sense to complete? If so, could you resolve the merge conflicts and we can review and complete this? I lack permissions to your fork so I'm not able to resolve the conflicts. |
cbcaa89
to
3e82cb6
Compare
02516c3
to
b814731
Compare
Thank you for reminding me @loadams |
Thanks for notifying. The model I wanted to support is bit outdated so I think it's ok to close. |
@twaka - thanks for the reply, apologies for dropping the ball on this long ago from our side. |
In transformers,
use_parallel_residual
argument controls the residual computing way of gpt-neox since v4.23.0.huggingface/transformers#18695
In deepspeed,
mlp_after_attn
controls it but currently it's not awareuse_parallel_residual
.This PR makes to set
mlp_after_attn
according touse_parallel_residual
if exists.