-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuning Bloom model in step 3 failed #451
Comments
same error |
same error |
Same error. Modifying the |
similar but not same error。
what should i do to fix this error? |
Any update to this issue? |
same error for actor model :bloomz-7b1 and reward model :opt1.3b |
NoImplementationError is caused by softmaxfunction when config.fp16 is False. Perhaps you've modified fp16 to bf16 that in ds_utils.py according to some issue(same as me). |
Not working at all. The padding_side for opt is right, while for bloomz it is left. I tried passing in two different tokenizers, but it caused a lot of conflicts when making the experience. |
Similar issue on DeepSpeed side: microsoft/DeepSpeed#3518 |
Same error with actor model bloom560m, and critic model opt-350m. Any update? |
Hi @cokuehuang, Can you please try running this again and include the following PR as well: I've been able to get this running with the DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning$ bash training_scripts/bloom/single_node/run_bloom.sh bigscience/bloomz-1b7 ../step2_reward_model_finetuning/bloom_7b_output/ 3 3 output_bloom7b_actor_hf_critic_step2 Thanks, |
Hi @cokuehuang, Closing the issue for now since solution was provided. If any issues are still encountered, feel free to open another issue. |
Actor model: Bloom-1.1b
Reward model: Bloom-560m
Finetuning cmd:
bash training_scripts/single_node/run_bloom_1.1b.sh /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/bloom-1.1b/ /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/reward_model/bloom-560m
Part of training log:
Howerve, change model to opt works well.
The text was updated successfully, but these errors were encountered: