-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.4.1] ValueError: Attempting to unscale FP16 gradients. #834
Comments
@anj-s has anything changed with respects to FSDP and gradients in FP16 mode? |
@SeanNaren Nothing should have changed as far as gradients and fp16. @carmocca I can reproduce this issue in FairScale 0.4.0 without lightning. Let me look into this issue to figure out root cause. |
I have been able to reproduce the issue. The root cause is the unscale_grads function call here. The value for allow_fp16 is set to False, hence your code is breaking. Copy the
|
@anj-s If we extend the |
@anupam-fb thanks for looking into this! The fix sounds good. The only things I would follow up on are:
|
|
@carmocca The solution to this issue has been merged in main. Could you try installing fairscale ( |
@anupambhatnagar Seems to be working, thank you! Feel free to close this issue. Also, please ping me here once 0.4.3 is released, the Lightning CI has fairscale pinned at the moment. |
@carmocca FYI - Fairscale 0.4.5 will be releasing soon. ICYMI, we are introducing Per Layer Gradient Scaling in the upcoming version. Could you please share how you are using the ShardedGradScaler? Feedback on that would be helpful to us. Thanks! |
Thanks for the heads-up!
Our integration is quite simple, internally we just select this class when the user requested sharded with mixed precision: And override |
🐛 Bug
Command
python script.py
on a GPU machineTo Reproduce
Expected behavior
I'm not entirely sure whether this is a problem on our end or in the release. But this did work with the 0.4.0 release.
cc @SeanNaren
Environment
Thank you for your help!
The text was updated successfully, but these errors were encountered: