-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSDP checkpointing uses deprecated APIs with PyTorch 2.2 #19462
Comments
Two more which probably need to be fixed in PyTorch /home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py:1132: UserWarning: Please use DTensor instead and we are deprecating ShardedTensor.
warnings.warn(DEPRECATE_MSG) From ( File "/home/carlos/stuff.py", line 29, in <module>
fabric.save(f"{compile}_before_fwd", {"model": fmodel})
File "/home/carlos/lightning/src/lightning/fabric/fabric.py", line 770, in save
self._strategy.save_checkpoint(path=path, state=_unwrap_objects(state), filter=filter)
File "/home/carlos/lightning/src/lightning/fabric/strategies/fsdp.py", line 484, in save_checkpoint
converted = obj.state_dict()
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1922, in state_dict
hook_result = hook(self, destination, prefix, local_metadata)
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py", line 737, in _post_state_dict_hook
local_shape = tensor.shape
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py", line 1134, in __torch_function__
traceback.print_stack() /home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/filesystem.py:151: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
if tensor.storage().size() != tensor.numel(): From ( File "/home/carlos/stuff.py", line 29, in <module>
fabric.save(f"{compile}_before_fwd", {"model": fmodel})
File "/home/carlos/lightning/src/lightning/fabric/fabric.py", line 770, in save
self._strategy.save_checkpoint(path=path, state=_unwrap_objects(state), filter=filter)
File "/home/carlos/lightning/src/lightning/fabric/strategies/fsdp.py", line 496, in save_checkpoint
save_state_dict(converted_state, writer)
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py", line 40, in save_state_dict
return _save_state_dict(
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py", line 280, in _save_state_dict
return distW.all_reduce("write", write_data, finish_checkpoint)
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/utils.py", line 210, in all_reduce
local_data = map_fun()
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py", line 270, in write_data
all_writes = storage_writer.write_data(final_local_plan, planner)
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/filesystem.py", line 470, in write_data
_write_files_from_queue(
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/filesystem.py", line 284, in _write_files_from_queue
loader.start_loading()
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/filesystem.py", line 179, in start_loading
self._refill()
File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/filesystem.py", line 150, in _refill
traceback.print_stack() |
If the newer /home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/utils.py:409: UserWarning: The argument order of save has been changed. Please check the document to avoid future breakages.
warnings.warn( This probably applies to load too. I haven't tried it |
I agree we need to update these imports. |
Technically lit-gpt doesn't rely on nightly since the 2.2 release. I opened #19463 |
Also opened pytorch/pytorch#119802 upstream. We might want to silence these after this is resolved |
pytorch/pytorch#119800 (comment) suggests that we should replace (in 2.2+) most of what we have with |
Bug description
See added deprecation warnings in pytorch/pytorch#113867
What version are you seeing the problem on?
v2.2
How to reproduce the bug
Originated from
pytorch-lightning/src/lightning/fabric/strategies/fsdp.py
Line 496 in b097a4d
We already use the newer API for loading
pytorch-lightning/src/lightning/fabric/strategies/fsdp.py
Lines 563 to 566 in b097a4d
Error messages and logs
Environment
No response
More info
No response
cc @awaelchli @carmocca
The text was updated successfully, but these errors were encountered: