facebookresearch / fairscale Public

Notifications You must be signed in to change notification settings
Fork 281
Star 3.2k

Code
Issues 75
Pull requests 28
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: facebookresearch/fairscale

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

75 Open 285 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

When mixed precision input contains grad tensor, FSDP cast it with no grad

#1191 opened Dec 8, 2024 by kamwoh

Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py

#1189 opened Aug 30, 2024 by Youngluc

Raising assert param.grad is not None when finetuning LoRA.

#1188 opened Jun 26, 2024 by HashimotoPatrickMu

[question] Different training between DDP & Sharded DDP

#1172 opened Mar 29, 2024 by kwohlfahrt

Example of MOE

#1165 opened Feb 28, 2024 by Juanhui28

How can I use torchrun + model parallelism + FSDP

#1155 opened Dec 8, 2023 by HackGiter

It is dangerous to using default non_block=True.

#1146 opened Oct 27, 2023 by heshenghuan

assert self.has_full_params

#1134 opened Sep 11, 2023 by pokameng

Why ShardedDDP and OSS are slower than Vanilla DDP

#1131 opened Aug 18, 2023 by powermano

pip install failed

#1130 opened Aug 17, 2023 by dogxxxxx

Error with nested models "Caffe2 uses a lazy allocation..."

#1129 opened Jul 26, 2023 by Emanuele97x

[bug] pip package 0.4.13 fails to build wheel

#1128 opened Jul 17, 2023 by project-tuva

Error Freezing Weights

#1126 opened Jun 5, 2023 by mostafaelhoushi

Compatibility with Pytorch 2.0; failing test test_gradient_value

#1124 opened May 13, 2023 by h-vetinari

Can exclude some layer parameter not to shard?

#1123 opened Apr 24, 2023 by robotcator

FSDP cannot consolidate optimizer state dict with flatten params is False

#1100 opened Dec 13, 2022 by ShenglongZ

clip_grad_norm_ from fairscale downcasts to bf16 before all reduce

#1092 opened Nov 2, 2022 by glample

Can't load optimizer state due to state_steps

#1083 opened Sep 26, 2022 by rowhanm

Running stats with gradient checkpointing

#1035 opened Jul 20, 2022 by vovaf709

How to exclude some operations in checkpoint wrapper?

#1014 opened Jun 22, 2022 by kenmbkr

containers:apply_to_tensors fails to return (or test) the application result on PackedSequence FSDP

FullyShardedDataParallel (zero-3)

#996 opened May 26, 2022 by crutcher

How to use Optimizer State Sharding with Sharpness-Aware Minimization?

#989 opened May 20, 2022 by kenmbkr

CUDA OOM when saving checkpoint (in consolidate_state_dict()) using OSS

#973 opened Apr 20, 2022 by crowsonkb

FSDP unnecessarily clones buffers in state_dict()? FSDP

FullyShardedDataParallel (zero-3)

question

Further information is requested

#966 opened Mar 25, 2022 by rohan-varma

Support BF16 for FSDP FSDP

FullyShardedDataParallel (zero-3)

#963 opened Mar 22, 2022 by yuvalkirstain

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly