Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train using qlora exist with error #15

Closed
suclogger opened this issue Jun 6, 2023 · 3 comments
Closed

Train using qlora exist with error #15

suclogger opened this issue Jun 6, 2023 · 3 comments
Labels
solved This problem has been already solved

Comments

@suclogger
Copy link

train script as follow :

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path /xx/model/model_weights/Ziya-LLaMA-13B \
    --do_train \
    --dataset xx \
    --finetuning_type lora \
    --output_dir /xx/output \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-3 \
    --num_train_epochs 10.0 \
    --resume_lora_training False \
    --plot_loss \
    --fp16 \
    --quantization_bit 4

error message as follow :

Traceback (most recent call last):
  File "/xxx/src/train_sft.py", line 97, in <module>
    main()
  File "/xxx/src/train_sft.py", line 69, in main
    train_result = trainer.train()
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1638, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1923, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 2733, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 2758, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
    return model_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
    outputs = self.model(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 570, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 566, in custom_forward
    return module(*inputs, output_attentions, None)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/chatglm_etuning/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (724x5120 and 1x13107200)
  0%|                                                                                                                | 0/30 [00:00<?, ?it/s]
@suclogger suclogger changed the title Train exist with error Train using qlora exist with error Jun 6, 2023
@suclogger
Copy link
Author

relate issue :

artidoro/qlora#100
artidoro/qlora#12

@suclogger
Copy link
Author

upgrade peft solve the issue.

pip install -U git+https://github.com/huggingface/peft.git

@hiyouga hiyouga added the solved This problem has been already solved label Jun 7, 2023
@starphantom666
Copy link

starphantom666 commented Jun 8, 2023

upgrade peft solve the issue.

pip install -U git+https://github.com/huggingface/peft.git

我怎么还是报错。。。。升级了


更新一下好了,原来先要卸载,再重装才行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants