-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]running step3 use bloomz + lora + zero3, raise RuntimeError(f"{param.ds_summary()} already in registry") #3528
Comments
Hello @liuaiting. Thank you for reporting this issue to us. One of our recent fixes #3462 may have already fixed this error. Could you update your deepspeed and give it another try? |
After I update deepspeed, it can run successfully, thank you very much for your reply. |
@liuaiting Glad to hear the error is fixed. Closing the issue |
@HeyangQin Still encounter this with the deepspeed version 0.10.3, running step3 use llama2 + lora + zero3, v100*32G anaconda3.9/envs/dschat/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 52, in setitem |
Even though my local copy of repository is up to date I am encountering this error. Log is below. Last line of the log shows the command I run with all the options. Epoch: 0 | Step: 75 | PPO Epoch: 1 | Actor Loss: 0.05474853515625 | Critic Loss: 0.0821533203125 | Unsupervised Loss: 0.0
|
Describe the bug
When running step 3 with ZERO stage 3 enabled and lora for both the actor and critic models.
An error was reported, it seems to tell me that bloomz does not support zero3+lora.
Log output
To Reproduce
the
run.sh
is:the
run_bloom_1b7.sh
is:Expected behavior
use zero3+lora for training step3
ds_report output
Screenshots
no. The error is in the
Log output
System info (please complete the following information):
Docker context
no
Additional context
no
The text was updated successfully, but these errors were encountered: