(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

delock · 2023-07-10T10:21:19Z

This PR is aggregation of a few recent fixes inorder to support customer. This PR contains the following PRs with some other minor fixes:

Besides, we have these PRs under track (not in this PR branch but we hope they be merged):

Needed by CPU training Support cpu tensors without direct device invocation #3842
short kernel sequence to graph support Capture short kernel sequences to graph #4318
Larger scale support with MPICH launcher deepspeed/launcher: add launcher_helper as each rank's start portal #4699
XPU upstream [XPU] XPU accelerator support for Intel GPU device #4547
WOQ support for autotp support autoTP with weight only quantization in DS inference path #4750
Support model with .safetensor model file only add sharded loading for safetensors in AutoTP #4854
(new) Update list of supported AutoTP models [docs] Add new autotp supported model in tutorial #4960

Signed-off-by: Wang, Yi A <[email protected]>

…p CCLBackend

* fix model partition load cpu mem increase * fix format * fix format

loadams · 2023-11-10T21:33:56Z

This looks to have been merged already so we can close this PR?

* support baichuan model * support baichuan without changing model script

delock · 2023-11-27T05:47:19Z

Hi @loadams This PR has some new changes that is working on merge into master, I have updated PR description. Can you help reopen this PR with draft mode? Thanks!

We get AutoTP support request for new model from time to time or get bug reports, so sometimes we need to submit new PRs to DeepSpeed for supporting, and add these changes to this PR for early customer access before these changes goes to master. Hope this helps.

loadams · 2023-11-27T14:12:12Z

Apologies, yes happy to re-open.

* enable starcode autotp * add get_n_embd

delock · 2024-09-20T06:43:25Z

Close as most pending PRs in this list is merged. We will create an issue as to track current open PRs.

delock and others added 30 commits May 19, 2023 23:36

add show_straggler argument to log_summary()

c52d1e2

Show straggler effect logging in seperate table

de368db

fix formatting

6884e33

add docs for log_summary with straggler effect

206d455

Merge branch 'master' into gma/log_summary_straggler

c586171

Merge branch 'master' into gma/log_summary_straggler

35c72df

Merge branch 'master' into gma/log_summary_straggler

21975fd

Merge branch 'master' into gma/log_summary_straggler

4630189

Merge branch 'master' into gma/log_summary_straggler

7b4db63

fix opt-350m shard loading issue in AutoTP

4a9ad5d

Signed-off-by: Wang, Yi A <[email protected]>

Merge branch 'master' into gma/log_summary_straggler

05f9732

Merge branch 'gma/log_summary_straggler' into gma/run-opt-branch

becc759

init version of CCLBacked allreduce_latency

d5552ef

remove torch-ccl as dependency

f0ea3eb

init allreduce for latency without actual reduce operation

6caf695

first version of SHM based direct allreduce

f689f22

tweak reduce kernel

bc48c7e

SHM allreduce support 2-8 ranks

53b4846

clean up

4b10db9

remove oneCCL binding for pytorch from workflow, use gloo to bootstra…

1c10c66

…p CCLBackend

add gpt-neox autotp support

3151996

fallback to oneccl if input is too large

c5dd6dc

code clean up

f402f0b

first clean up code

6ecf721

add checks for allreduce_low_latency, remove warning

d6a3ac8

remove redudant declaration, fix 2 ranks

3364a93

remove avx512f path

afc67f6

check whether buffer size is divisible by 16

6f05cbf

autoTP linear allreduce should go to allreduce_low_latency

4a38410

cleanup profile code

c88f3bd

delock and others added 14 commits August 28, 2023 22:42

Merge branch 'master' into gma/run-opt-branch-rebase

370ad5e

Merge branch 'master' into gma/run-opt-branch-rebase

b66b020

cherry pick fix for activation size not divisible by attention heads

aa64514

Support uneven sharding for lm_head

8b0a887

fix CPU loading model OOM. (#13)

f15e6d4

* fix model partition load cpu mem increase * fix format * fix format

Merge branch 'up-master' into gma/run-opt-branch-rebase

5718a88

merge latest change in uneven_heads

f02d40f

move tp_shard to module_inject

3a8ad63

support baichuan model (#14)

f0ef3ea

Merge branch 'up-master' into gma/run-opt-branch-rebase

5ab9d58

Merge branch 'up-master' into gma/run-opt-branch-rebase

a05bd5b

fix bug in lm_head, cherry pick from microsoft#4522

91f56a2

Merge branch 'gma/run-opt-branch-rebase' into gma/run-opt-branch

2016f30

fix uneven heads issue (#25)

57ff508

loadams closed this Nov 10, 2023

Yejing-Lai and others added 5 commits November 16, 2023 09:46

fix imbalance autotp issue (#31)

09a348c

fix splt shape < 64 issue & add num_kv_heads to mp_params (#33)

14f5058

Baodi/support baichuan (#23)

8d60432

* support baichuan model * support baichuan without changing model script

fix Baichuan-7B qkv order error (#35)

0ebb1ed

fix baichuan lm_head replace issue (#34)

547ac96

shorten inference_all_reduce call stack (#37)

cd070bf

loadams reopened this Nov 27, 2023

Yejing-Lai and others added 3 commits December 15, 2023 15:36

Enable starcode autotp (#38)

e8ab894

* enable starcode autotp * add get_n_embd

fix falcon-40b accuracy issue (#39)

092b0f2

fix t5 and mistral model load from config meta tensor bug (#42)

94873fe

delock closed this Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

delock commented Jul 10, 2023 •

edited

Loading

loadams commented Nov 10, 2023

delock commented Nov 27, 2023 •

edited

Loading

loadams commented Nov 27, 2023

delock commented Sep 20, 2024

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

Conversation

delock commented Jul 10, 2023 • edited Loading

loadams commented Nov 10, 2023

delock commented Nov 27, 2023 • edited Loading

loadams commented Nov 27, 2023

delock commented Sep 20, 2024

delock commented Jul 10, 2023 •

edited

Loading

delock commented Nov 27, 2023 •

edited

Loading