You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Align the OSS_ddp architecture with this proposal pytorch/pytorch#42849.
A main difference is that oss_ddp owns the construction of the sharded optimizer, it does not have to be exposed.
Motivation
Prototype something closer to the RFC
Bring the sharding in the same place, so that we can eventually align the sharding from the optimizer and the model parameter sharding. Right now DDP gets the full model
Pitch
Keep backward compatibilty: using OSS directly stays possible, but if targeting oss_ddp, then it owns the sharded optimizer. Eventually surface the sharding and align both
Alternatives
Keep the current approach in two seperate steps (shard DDP / shard optim), but it does not seem very future proof
Additional context
The text was updated successfully, but these errors were encountered:
🚀 Feature
Align the OSS_ddp architecture with this proposal pytorch/pytorch#42849.
A main difference is that oss_ddp owns the construction of the sharded optimizer, it does not have to be exposed.
Motivation
Pitch
Keep backward compatibilty: using OSS directly stays possible, but if targeting oss_ddp, then it owns the sharded optimizer. Eventually surface the sharding and align both
Alternatives
Keep the current approach in two seperate steps (shard DDP / shard optim), but it does not seem very future proof
Additional context
The text was updated successfully, but these errors were encountered: