-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'checkpoint_event_prologue' [BUG] #3601
Comments
@wj210, can you please share full repro steps, including full script and command line? Thanks! |
The code is kinda long, but it mainly happens due to the fact that during testing, trainer.test ckpt_path was set to the default which is best and involves loading from the saved ckptpoint, as you can see when ckpt_path was set to "last", the error did not occurred, tho something else happened, which was the printing of numerical digits shown above.
I believe the same code from #2449 will reproduce the error. The command line ran was
|
Can confirm this happens. This is a huge inconvenience. |
👀 |
Same situation, waiting to get a solution to this problem. The official doc is taking deepspeed_stage_3 as an example. I don't know why deepspeed_stage_3_offload is not working. My environment: torch 2.0.1
torchaudio 2.0.2
torchmetrics 0.11.4
torchvision 0.15.2
lightning 2.0.2 |
@wj210, @iamlockelightning, @RoyJames, FYI |
during inference, the error "AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'checkpoint_event_prologue'" occur, similar to #2449.
To Reproduce
The text was updated successfully, but these errors were encountered: