Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning Bloom model in step 3 failed #451

Closed
cokuehuang opened this issue Apr 28, 2023 · 12 comments
Closed

Finetuning Bloom model in step 3 failed #451

cokuehuang opened this issue Apr 28, 2023 · 12 comments
Assignees
Labels
deespeed chat DeepSpeed Chat new-config A modified config from the given example

Comments

@cokuehuang
Copy link

cokuehuang commented Apr 28, 2023

Actor model: Bloom-1.1b
Reward model: Bloom-560m
Finetuning cmd:
bash training_scripts/single_node/run_bloom_1.1b.sh /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/bloom-1.1b/ /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/reward_model/bloom-560m
Part of training log:

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
------------------------------------
Free memory : 67.217224 (GigaBytes)
Total memory: 79.346863 (GigaBytes)
Requested memory: 0.437500 (GigaBytes)
Setting maximum total tokens (input + output) to 512
WorkSpace: 0x7f8d36000000
------------------------------------
Traceback (most recent call last):
  File "main.py", line 525, in <module>
    main()
  File "main.py", line 434, in main
    out = trainer.generate_experience(prompts)
  File "/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 97, in generate_experience
    seq = self._generate_sequence(prompts)
  File "/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 73, in _generate_sequence
    seq = self.actor_model.module.generate(prompts,
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/runtime/hybrid_engine.py", line 254, in generate
    generate_ret_vals = self._generate(*inputs, **kwargs)
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/transformers/src/transformers/generation/utils.py", line 1513, in generate
    return self.greedy_search(
  File "/transformers/src/transformers/generation/utils.py", line 2330, in greedy_search
    outputs = self(
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/bloom/modeling_bloom.py", line 913, in forward
    transformer_outputs = self.transformer(
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/transformers/src/transformers/models/bloom/modeling_bloom.py", line 786, in forward
    outputs = block(
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 147, in forward
    self.attention(input,
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 160, in forward
    context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 253, in compute_attention
    attn_mask=((1 - input_mask).half() * minus_inf),
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/torch/_tensor.py", line 40, in wrapped
    return f(*args, **kwargs)
  File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/torch/_tensor.py", line 848, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.`

Howerve, change model to opt works well.

@evi-Genius
Copy link

same error

@lc222
Copy link

lc222 commented May 5, 2023

same error

@LiinXemmon
Copy link

Same error. Modifying the ds_attention.py brings NoImplementationError.

@lc222
Copy link

lc222 commented May 6, 2023

similar but not same error。

File "main.py", line 552, in <module> main() File "main.py", line 458, in main 192.18.75.0: out = trainer.generate_experience(prompts) 192.18.75.0: File "/baichuan/haoyu/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 203, in generate_experience 192.18.75.0: seq = self._generate_sequence(prompts) 192.18.75.0: File "/baichuan/haoyu/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 161, in _generate_sequence 192.18.75.0: seq = self.actor_model.module.generate(prompts, 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 192.18.75.0: return func(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/generation/utils.py", line 1513, in generate 192.18.75.0: return self.greedy_search( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/generation/utils.py", line 2330, in greedy_search 192.18.75.0: outputs = self( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl 192.18.75.0: result = forward_call(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 913, in forward 192.18.75.0: transformer_outputs = self.transformer( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl 192.18.75.0: result = forward_call(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 730, in forward 192.18.75.0: inputs_embeds = self.word_embeddings(input_ids) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl 192.18.75.0: result = forward_call(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward 192.18.75.0: return F.embedding( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding 192.18.75.0: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 192.18.75.0: RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

what should i do to fix this error?

@samadejacobs samadejacobs added the deespeed chat DeepSpeed Chat label May 9, 2023
@stgzr
Copy link

stgzr commented May 16, 2023

Any update to this issue?

@jomayeri jomayeri added the new-config A modified config from the given example label May 19, 2023
@awan-10 awan-10 assigned tohtana and unassigned cmikeh2 May 26, 2023
@scarydemon2
Copy link

scarydemon2 commented May 29, 2023

same error for actor model :bloomz-7b1 and reward model :opt1.3b

@scarydemon2
Copy link

Same error. Modifying the ds_attention.py brings NoImplementationError.

NoImplementationError is caused by softmaxfunction when config.fp16 is False. Perhaps you've modified fp16 to bf16 that in ds_utils.py according to some issue(same as me).
To solve this problem:
Change File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 253, in compute_attention
attn_mask=((1 - input_mask).half() * minus_inf),
Into
attn_mask=((1-input_mask.int()).half() * minus_inf),
will work for me

@scarydemon2
Copy link

Same error. Modifying the ds_attention.py brings NoImplementationError.

NoImplementationError is caused by softmaxfunction when config.fp16 is False. Perhaps you've modified fp16 to bf16 that in ds_utils.py according to some issue(same as me). To solve this problem: Change File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 253, in compute_attention attn_mask=((1 - input_mask).half() * minus_inf), Into attn_mask=((1-input_mask.int()).half() * minus_inf), will work for me

Not working at all. The padding_side for opt is right, while for bloomz it is left. I tried passing in two different tokenizers, but it caused a lot of conflicts when making the experience.

@jeffra
Copy link
Contributor

jeffra commented Jun 9, 2023

Similar issue on DeepSpeed side: microsoft/DeepSpeed#3518

@roy-mzh
Copy link

roy-mzh commented Aug 21, 2023

Same error with actor model bloom560m, and critic model opt-350m. Any update?

@lekurile lekurile self-assigned this Sep 14, 2023
@lekurile
Copy link
Contributor

lekurile commented Sep 14, 2023

Hi @cokuehuang,

Can you please try running this again and include the following PR as well:

I've been able to get this running with the bigscience/bloomz-1b7 BLOOM model:

DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning$ bash training_scripts/bloom/single_node/run_bloom.sh bigscience/bloomz-1b7 ../step2_reward_model_finetuning/bloom_7b_output/ 3 3 output_bloom7b_actor_hf_critic_step2

Thanks,
Lev

@lekurile
Copy link
Contributor

Hi @cokuehuang,

Closing the issue for now since solution was provided. If any issues are still encountered, feel free to open another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deespeed chat DeepSpeed Chat new-config A modified config from the given example
Projects
None yet
Development

No branches or pull requests