Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflect use_parallel_residual in mlp_after_attn for module_inject #2446

Closed
wants to merge 4 commits into from

Conversation

twaka
Copy link

@twaka twaka commented Oct 26, 2022

In transformers, use_parallel_residual argument controls the residual computing way of gpt-neox since v4.23.0.
huggingface/transformers#18695

In deepspeed, mlp_after_attn controls it but currently it's not aware use_parallel_residual.

This PR makes to set mlp_after_attn according to use_parallel_residual if exists.

@loadams
Copy link
Contributor

loadams commented Aug 23, 2023

Hi @twaka - apologies for not getting to review this sooner - is this a PR that still makes sense to complete? If so, could you resolve the merge conflicts and we can review and complete this? I lack permissions to your fork so I'm not able to resolve the conflicts.

@loadams loadams self-assigned this Aug 23, 2023
@twaka twaka closed this Aug 23, 2023
@twaka twaka force-pushed the reflect-use_parallel_residual branch from cbcaa89 to 3e82cb6 Compare August 23, 2023 23:59
@twaka twaka reopened this Aug 24, 2023
@twaka twaka force-pushed the reflect-use_parallel_residual branch from 02516c3 to b814731 Compare August 24, 2023 01:06
@twaka
Copy link
Author

twaka commented Aug 24, 2023

Thank you for reminding me @loadams
I have updated the fix based on the latest master.
This is still required to correctly inference models that customizes use_parallel_residual when training.

@loadams loadams requested review from RezaYazdaniAminabadi and removed request for awan-10 September 25, 2023 22:00
@loadams
Copy link
Contributor

loadams commented Oct 28, 2024

Thank you for reminding me @loadams I have updated the fix based on the latest master. This is still required to correctly inference models that customizes use_parallel_residual when training.

Hi @twaka - I'm so sorry for losing track of this PR - do you still believe this should be merged?

@twaka
Copy link
Author

twaka commented Oct 29, 2024

Thanks for notifying. The model I wanted to support is bit outdated so I think it's ok to close.

@loadams
Copy link
Contributor

loadams commented Oct 29, 2024

Thanks for notifying. The model I wanted to support is bit outdated so I think it's ok to close.

@twaka - thanks for the reply, apologies for dropping the ball on this long ago from our side.

@loadams loadams closed this Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants