[refactor] Refactor and enable multiprocess nn.Pipe benchmarks. #319

anj-s · 2021-01-21T17:02:31Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Reenables multiprocess nn.Pipe benchmarks(backend=nccl, rpc initialization is deprecated ProcessGroupRpcBackend)
Moves num_decoder levels from args to config dict.
Removes flags that we are not current testing with single or multiprocess benchmarks. They were for testing MPI + AsyncPipeline style. (we should add that benchmark in a followup PR)
General cleanup.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

blefaudeux

LGTM !

anj-s · 2021-01-30T03:36:44Z

LGTM !

Updated the PR to enable and refactor some more code! PTAL when you get a chance. Thanks!

min-xu-ai

nice refactoring!

* [chore] Fix lint errors that broke master (#348) authored-by: Anjali Sridhar <[email protected]> * [fix] ShardedDDP - cpu testfix - remove Gloo/CPU (#350) * no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale * [feat][OSS] elastic and pytorch compatible checkpoints (#310) * adding a test to prove the inter operability with upstream pytorch * updating the changelog * eager state pruning * pytorch 1.5 compat * [fix] ShardedDDP - properly handle post device change (#353) * adding the .to(device) support + unit testing * doc update * [feat] Add AdaScaleWrapper (#347) * [feat] Add AdaScaleWrapper - This enables a different API for wrapping an optimizer with AdaScale. - This also enables AdaScale to be wrapped by OSS. - However, OSS wrapping AdaScale results in different optimization, which future research will be needed to study its effects. testing: add unit tests. * addressed comment: typo * [refactor] Refactor and enable multiprocess nn.Pipe benchmarks. (#319) * mp cleanup * round of multiprocess refactoring * test golden run * print cuda stats * fix lint errors * enable multiprocess pipe benchmarks * set world size to be available gpus * more changes * use synthetic loaders for intermediate pipeline stages * merged master * fix for the devices property * dataloader fix * modify rank check * print wps stats * enable verification * fix logging * fix flag name * fix flag name * check for rank * fix indent * pass args * pass args * modify golden data * remove unused print messsage * fix lint errors * add comments * fix benchmarks Co-authored-by: Anjali Sridhar <[email protected]> * [refactor] pipe: simplify balance and module checks (#346) * [chore] v0.1.5 (#355) * [chore] disheartening switch off of a OSS cpu test (#356) * precise skip, only if agent has only cpu * [feat][minor] OSS Benchmark - regression test + background testing new optims (#352) * restoring the regression test, adding a test of the for_each optims * fix the regression test on circleci * removing unused flags * [refactor] multiprocess_pipe: cleanup __init__ (#357) * [refactor] multiprocess_pipe: remove retain_graph __init__ param (#358) It is not currently being used so we can simplify the interface by removing it. * [refactor] multiprocess_pipe: focus on LazyModule usage (#360) * [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361) * Adding a proper ddp parity / AMP unit test, overdue * catch non-AMP pytorch * [perf][OSS] Clip grad norm : minor obvious speedup (#363) cache this iterator, easy speed up * [refactor] multiprocess_pipe: remove pipelined_backward (#362) * [perf] ShardedDDP - small memory use reduction - minor speedup (#366) * minor * minor * [fix] repro+fix (#365) fix a broken earlier commit, only worked for the first step * [refactor] OSS only use flat buffers (#371) * flat params all along, way simpler * updating the docstring * [refactor] AsyncPipe: do not sub-class MultiProcessPipe (#370) * [refactor] remove multiprocess dependency on async (#373) * [fix] Workaround need for pip --no-build-isolation (#375) * Add fairscale.nn.misc.checkpoint_activations (#376) * Add fairscale.utils.containers Co-authored-by: Min Xu <[email protected]> * Add fairscale.nn.misc.checkpoint_activations Co-authored-by: Sam Shleifer <[email protected]> Co-authored-by: Min Xu <[email protected]> Co-authored-by: Sam Shleifer <[email protected]> * [chore] v0.1.6 (#377) * v0.1.6 Co-authored-by: anj-s <[email protected]> Co-authored-by: Benjamin Lefaudeux <[email protected]> Co-authored-by: Anjali Sridhar <[email protected]> Co-authored-by: msbaines <[email protected]> Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: Myle Ott <[email protected]> Co-authored-by: Sam Shleifer <[email protected]>

mp cleanup

1a652cc

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2021

blefaudeux approved these changes Jan 21, 2021

View reviewed changes

anj-s marked this pull request as draft January 21, 2021 17:21

Anjali Sridhar added 2 commits January 25, 2021 23:51

round of multiprocess refactoring

01ea666

test golden run

a4b84ec

anj-s changed the title ~~mp cleanup~~ [refactor] Refactor multiprocess nn.Pipe benchmarks. Jan 26, 2021

Anjali Sridhar added 23 commits January 26, 2021 20:42

print cuda stats

f2010d2

fix lint errors

93464be

enable multiprocess pipe benchmarks

494248e

set world size to be available gpus

fcd2fa7

more changes

3165ff6

merge latest changes from master

e99fb2d

use synthetic loaders for intermediate pipeline stages

d04d5f3

merged master

64f343e

fix for the devices property

f9d0b4c

dataloader fix

36fbb31

modify rank check

10460d2

print wps stats

896d8dd

enable verification

93f1776

fix logging

3d76544

fix flag name

5d6e48b

fix flag name

eefa3d4

check for rank

5305e82

fix indent

11f48ae

pass args

4cf79a6

pass args

a801a55

modify golden data

96ed116

remove unused print messsage

6f56899

fix lint errors

60d18e9

Anjali Sridhar added 2 commits January 29, 2021 19:27

merge changes from master

9caf7db

add comments

ae359e5

anj-s requested review from min-xu-ai and msbaines January 30, 2021 03:35

anj-s marked this pull request as ready for review January 30, 2021 03:35

anj-s changed the title ~~[refactor] Refactor multiprocess nn.Pipe benchmarks.~~ [refactor] Refactor and enable multiprocess nn.Pipe benchmarks. Jan 30, 2021

anj-s requested a review from blefaudeux January 30, 2021 03:36

fix benchmarks

7185349

min-xu-ai approved these changes Feb 3, 2021

View reviewed changes

Merge branch 'master' into pipe-refactor-6

a8e18f9

anj-s merged commit cd18644 into master Feb 3, 2021

blefaudeux deleted the pipe-refactor-6 branch February 16, 2021 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] Refactor and enable multiprocess nn.Pipe benchmarks. #319

[refactor] Refactor and enable multiprocess nn.Pipe benchmarks. #319

anj-s commented Jan 21, 2021 •

edited

Loading

blefaudeux left a comment

anj-s commented Jan 30, 2021

min-xu-ai left a comment

[refactor] Refactor and enable multiprocess nn.Pipe benchmarks. #319

[refactor] Refactor and enable multiprocess nn.Pipe benchmarks. #319

Conversation

anj-s commented Jan 21, 2021 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

blefaudeux left a comment

Choose a reason for hiding this comment

anj-s commented Jan 30, 2021

min-xu-ai left a comment

Choose a reason for hiding this comment

anj-s commented Jan 21, 2021 •

edited

Loading