-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensors not on the same device when using FSDP auto-wrapping #14900
Comments
Auto-wrapping support was added in #14383. I can't see this reproduce on master the same way, but I get:
cc @rohitgr7 |
ah! weird. there are tests that check this. Let me explore it more. def configure_optimizers(self):
return torch.optim.SGD(self.trainer.model.parameters(), lr=0.1) |
hi @rohitgr7 , I've tried your suggestion and the code works without throwing any errors, but then I want to set different param groups with different LR, and it seems these params become one single group called |
yes @TOPFARMER that's the case actually: pytorch/pytorch#76382 |
Bug
When using the FSDP with defaults and auto-wrapping, there is an error in forward saying tensors and weights are not on the same device.
To Reproduce
Error:
Additional context
Reported by user kavya on Slack.
Comment:
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.
The text was updated successfully, but these errors were encountered: