Reflect use_parallel_residual in mlp_after_attn for module_inject #2446

twaka · 2022-10-26T05:25:22Z

In transformers, use_parallel_residual argument controls the residual computing way of gpt-neox since v4.23.0.
huggingface/transformers#18695

In deepspeed, mlp_after_attn controls it but currently it's not aware use_parallel_residual.

This PR makes to set mlp_after_attn according to use_parallel_residual if exists.

loadams · 2023-08-23T15:43:04Z

Hi @twaka - apologies for not getting to review this sooner - is this a PR that still makes sense to complete? If so, could you resolve the merge conflicts and we can review and complete this? I lack permissions to your fork so I'm not able to resolve the conflicts.

twaka · 2023-08-24T01:27:31Z

Thank you for reminding me @loadams
I have updated the fix based on the latest master.
This is still required to correctly inference models that customizes use_parallel_residual when training.

loadams · 2024-10-28T16:00:47Z

Thank you for reminding me @loadams I have updated the fix based on the latest master. This is still required to correctly inference models that customizes use_parallel_residual when training.

Hi @twaka - I'm so sorry for losing track of this PR - do you still believe this should be merged?

twaka · 2024-10-29T02:32:38Z

Thanks for notifying. The model I wanted to support is bit outdated so I think it's ok to close.

loadams · 2024-10-29T19:18:59Z

Thanks for notifying. The model I wanted to support is bit outdated so I think it's ok to close.

@twaka - thanks for the reply, apologies for dropping the ball on this long ago from our side.

twaka requested review from jeffra, samyam, tjruwase, ShadenSmith, conglongli, awan-10, cli99, eltonzheng, minjiaz, RezaYazdaniAminabadi, duli2012, mrwyattii, yaozhewei, arashb, xiaoxiawu-microsoft, samadejacobs, cmikeh2 and GuanhuaWang as code owners October 26, 2022 05:25

loadams self-assigned this Aug 23, 2023

twaka closed this Aug 23, 2023

twaka force-pushed the reflect-use_parallel_residual branch from cbcaa89 to 3e82cb6 Compare August 23, 2023 23:59

twaka reopened this Aug 24, 2023

reflect use_parallel_residual to mlp_after_attn

b814731

twaka force-pushed the reflect-use_parallel_residual branch from 02516c3 to b814731 Compare August 24, 2023 01:06

loadams added 2 commits August 24, 2023 10:54

Merge branch 'master' into reflect-use_parallel_residual

711433a

Merge branch 'master' into reflect-use_parallel_residual

94208c7

loadams removed request for ShadenSmith and cmikeh2 September 6, 2023 18:26

loadams removed request for duli2012, samyam, conglongli, GuanhuaWang, samadejacobs, cli99, yaozhewei, eltonzheng, minjiaz, RezaYazdaniAminabadi and xiaoxiawu-microsoft September 6, 2023 18:26

Merge branch 'master' into reflect-use_parallel_residual

030173f

loadams requested review from RezaYazdaniAminabadi and removed request for awan-10 September 25, 2023 22:00

loadams closed this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reflect use_parallel_residual in mlp_after_attn for module_inject #2446

Reflect use_parallel_residual in mlp_after_attn for module_inject #2446

twaka commented Oct 26, 2022

loadams commented Aug 23, 2023

twaka commented Aug 24, 2023

loadams commented Oct 28, 2024

twaka commented Oct 29, 2024

loadams commented Oct 29, 2024

Reflect use_parallel_residual in mlp_after_attn for module_inject #2446

Reflect use_parallel_residual in mlp_after_attn for module_inject #2446

Conversation

twaka commented Oct 26, 2022

loadams commented Aug 23, 2023

twaka commented Aug 24, 2023

loadams commented Oct 28, 2024

twaka commented Oct 29, 2024

loadams commented Oct 29, 2024