-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assert self.has_full_params #1134
Comments
Can you first try the pytorch version of FSDP? If you can't, can you please let me know the reason? Also, we usually wrap the top level model with FSDP. Quickly reading your code, it seems that you only wrapped 2 sub-modules? |
@min-xu-ai |
Sorry. I can't. Can you check with pytorch folks since they have a FSDP version that's more supported and official? |
hello
@min-xu-ai
I'm using FSDP to encapsulate the model, but I'm getting the following error:
assert self.has_full_params
This is my codes:
`model = build_network(cfg).cuda()
model.cnet = auto_wrap_bn(model.cnet,single_rank_pg=False)
loguru_logger.info("Parameter Count: %d" % count_parameters(model)) # 12659389
Traceback (most recent call last): File "/home/dxy/anaconda3/envs/videoflow/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/wsm/VideoFlow-main/FSDP_BOF.py", line 250, in main_worker train(cfg) File "/home/wsm/VideoFlow-main/FSDP_BOF.py", line 174, in train loss.backward() File "/home/dxy/anaconda3/envs/videoflow/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/dxy/anaconda3/envs/videoflow/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/home/dxy/anaconda3/envs/videoflow/lib/python3.10/site-packages/fairscale/nn/data_parallel/fully_sharded_data_parallel.py", line 1516, in _pre_backward_hook self._use_full_params() File "/home/dxy/anaconda3/envs/videoflow/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/dxy/anaconda3/envs/videoflow/lib/python3.10/site-packages/fairscale/nn/data_parallel/fully_sharded_data_parallel.py", line 2061, in _use_full_params assert self.has_full_params AssertionError
Can you help me?
The text was updated successfully, but these errors were encountered: