Privatize the sharded optimizer within the sharded DDP #91

blefaudeux · 2020-09-16T21:22:42Z

🚀 Feature

Align the OSS_ddp architecture with this proposal pytorch/pytorch#42849.
A main difference is that oss_ddp owns the construction of the sharded optimizer, it does not have to be exposed.

Motivation

Prototype something closer to the RFC
Bring the sharding in the same place, so that we can eventually align the sharding from the optimizer and the model parameter sharding. Right now DDP gets the full model

Pitch

Keep backward compatibilty: using OSS directly stays possible, but if targeting oss_ddp, then it owns the sharded optimizer. Eventually surface the sharding and align both

Alternatives

Keep the current approach in two seperate steps (shard DDP / shard optim), but it does not seem very future proof

Additional context

blefaudeux self-assigned this Sep 16, 2020

blefaudeux mentioned this issue Sep 17, 2020

[feat] Sharded DDP - small refactor and new features #97

Merged

4 tasks

blefaudeux closed this as completed in #97 Sep 17, 2020

myleott added a commit that referenced this issue Feb 22, 2021

Bugfix for backward stream (#91)

a0fe832

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Privatize the sharded optimizer within the sharded DDP #91

Privatize the sharded optimizer within the sharded DDP #91

blefaudeux commented Sep 16, 2020

Privatize the sharded optimizer within the sharded DDP #91

Privatize the sharded optimizer within the sharded DDP #91

Comments

blefaudeux commented Sep 16, 2020

🚀 Feature

Motivation

Pitch

Alternatives

Additional context