Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] pipe: simplify balance and module checks #346

Merged
merged 2 commits into from
Feb 3, 2021
Merged

Conversation

msbaines
Copy link
Contributor

Also, remove run-time type checks. mypy is already checking types.
However, a user could still pass the wrong types and not use
type-checking.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Also, remove run-time type checks. mypy is already checking types.
However, a user could still pass the wrong types and not use
type-checking.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 31, 2021
if module_len != sum(balance):
raise BalanceError(
def check_balance(module: Union[nn.Sequential, List[LazyModule]], balance: List[int]) -> None:
if len(module) != sum(balance):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this isn't len(set(map(id, module))) != len(module), similar to the condition in verify_module()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One verifies that there are no duplicates in the list of layers. While the other verifies that the size of the list of layers matches the size of balance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but from the code that you removed (at one place), the condition that I mentioned was getting called?

Copy link
Contributor

@sidgoyal78 sidgoyal78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor. I have put an inline question.

@msbaines msbaines merged commit f21b5ff into master Feb 3, 2021
myleott added a commit that referenced this pull request Feb 22, 2021
* [chore] Fix lint errors that broke master (#348)

authored-by: Anjali Sridhar <[email protected]>

* [fix] ShardedDDP - cpu testfix - remove Gloo/CPU (#350)

* no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale

* [feat][OSS] elastic and pytorch compatible checkpoints (#310)

* adding a test to prove the inter operability with upstream pytorch
* updating the changelog
* eager state pruning
* pytorch 1.5 compat

* [fix] ShardedDDP - properly handle post device change (#353)

* adding the .to(device) support + unit testing
* doc update

* [feat] Add AdaScaleWrapper (#347)

* [feat] Add AdaScaleWrapper

- This enables a different API for wrapping an optimizer with AdaScale.
- This also enables AdaScale to be wrapped by OSS.
- However, OSS wrapping AdaScale results in different optimization,
  which future research will be needed to study its effects.

testing: add unit tests.

* addressed comment: typo

* [refactor] Refactor and enable multiprocess nn.Pipe benchmarks. (#319)

* mp cleanup

* round of multiprocess refactoring

* test golden run

* print cuda stats

* fix lint errors

* enable multiprocess pipe benchmarks

* set world size to be available gpus

* more changes

* use synthetic loaders for intermediate pipeline stages

* merged master

* fix for the devices property

* dataloader fix

* modify rank check

* print wps stats

* enable verification

* fix logging

* fix flag name

* fix flag name

* check for rank

* fix indent

* pass args

* pass args

* modify golden data

* remove unused print messsage

* fix lint errors

* add comments

* fix benchmarks

Co-authored-by: Anjali Sridhar <[email protected]>

* [refactor] pipe: simplify balance and module checks (#346)

* [chore] v0.1.5 (#355)

* [chore] disheartening switch off of a OSS cpu test (#356)

* precise skip, only if agent has only cpu

* [feat][minor] OSS Benchmark - regression test + background testing new optims (#352)

* restoring the regression test, adding a test of the for_each optims
* fix the regression test on circleci
* removing unused flags

* [refactor] multiprocess_pipe: cleanup __init__ (#357)

* [refactor] multiprocess_pipe: remove retain_graph __init__ param (#358)

It is not currently being used so we can simplify the interface
by removing it.

* [refactor] multiprocess_pipe: focus on LazyModule usage (#360)

* [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361)

* Adding a proper ddp parity / AMP unit test, overdue
* catch non-AMP pytorch

* [perf][OSS] Clip grad norm : minor obvious speedup (#363)

cache this iterator, easy speed up

* [refactor] multiprocess_pipe: remove pipelined_backward (#362)

* [perf] ShardedDDP - small memory use reduction - minor speedup (#366)

* minor

* minor

* [fix] repro+fix (#365)

fix a broken earlier commit, only worked for the first step

* [refactor] OSS only use flat buffers (#371)

* flat params all along, way simpler
* updating the docstring

* [refactor] AsyncPipe: do not sub-class MultiProcessPipe (#370)

* [refactor] remove multiprocess dependency on async (#373)

* [fix] Workaround need for pip --no-build-isolation (#375)

* Add fairscale.nn.misc.checkpoint_activations (#376)

* Add fairscale.utils.containers

Co-authored-by: Min Xu <[email protected]>

* Add fairscale.nn.misc.checkpoint_activations

Co-authored-by: Sam Shleifer <[email protected]>

Co-authored-by: Min Xu <[email protected]>
Co-authored-by: Sam Shleifer <[email protected]>

* [chore] v0.1.6 (#377)

* v0.1.6

Co-authored-by: anj-s <[email protected]>
Co-authored-by: Benjamin Lefaudeux <[email protected]>
Co-authored-by: Anjali Sridhar <[email protected]>
Co-authored-by: msbaines <[email protected]>
Co-authored-by: Leonard Lausen <[email protected]>
Co-authored-by: Myle Ott <[email protected]>
Co-authored-by: Sam Shleifer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants