Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 #85

Closed
molyswu opened this issue Apr 18, 2023 · 21 comments
Closed

Comments

@molyswu
Copy link

molyswu commented Apr 18, 2023

Hi,

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000

请问是什么问题?

@Facico
Copy link
Owner

Facico commented Apr 18, 2023

能给出更详细一点的报错信息吗,不过我猜测你应该是加载了别人的模型?你可以参考一下类似的issue

@molyswu
Copy link
Author

molyswu commented Apr 18, 2023

双卡,RTX3090:

if not args.wandb:
37 os.environ["WANDB_MODE"] = "disable"
38 # optimized for RTX 4090. for larger GPUs, increase some of these?
39 MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
40 BATCH_SIZE = 128
41 MAX_STEPS = None
42 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
43 EPOCHS = 3 # we don't always need 3 tbh
44 LEARNING_RATE = 3e-4 # the Karpathy constant
45 CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
46 LORA_R = 8
47 LORA_ALPHA = 16
48 LORA_DROPOUT = 0.05
49 VAL_SET_SIZE = args.test_size #2000
50 TARGET_MODULES = [
51 "q_proj",
52 "v_proj",
53 ]

@Facico
Copy link
Owner

Facico commented Apr 18, 2023

就是你运行脚本之类的都没改过吗?

@molyswu
Copy link
Author

molyswu commented Apr 19, 2023

没有改,基础模型就是LLAma-7b

@Facico
Copy link
Owner

Facico commented Apr 19, 2023

你只给了这个报错信息,我只能判断你模型使用的tokenzier和我们使用的tokenzier不一致。你可以参考我们issue提问模板进行提问,或者参考一下别人是怎么提问的。
问的太抽象我不能很好的复现出你的问题。

@molyswu
Copy link
Author

molyswu commented Apr 19, 2023

/root/anaconda3/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
0%| | 0/32481 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
./Chinese-Vicuna/finetune.py:271 in │
│ │
│ 268 │
│ 269 print("\n If there's a warning about missing keys above, please disregard :)") │
│ 270 │
│ ❱ 271 trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) │
│ 272 │
│ 273 model.save_pretrained(OUTPUT_DIR) │
│ 274 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1929 in _inner_training_loop │
│ │
│ 1926 │ │ │ │ │ with model.no_sync(): │
│ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1928 │ │ │ │ else: │
│ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1930 │ │ │ │ │
│ 1931 │ │ │ │ if ( │
│ 1932 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2699 in training_step │
│ │
│ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2697 │ │ │
│ 2698 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2700 │ │ │
│ 2701 │ │ if self.args.n_gpu > 1: │
│ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2731 in compute_loss │
│ │
│ 2728 │ │ │ labels = inputs.pop("labels") │
│ 2729 │ │ else: │
│ 2730 │ │ │ labels = None │
│ ❱ 2731 │ │ outputs = model(**inputs) │
│ 2732 │ │ # Save past state if it exists │
│ 2733 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2734 │ │ if self.args.past_index >= 0: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ in forward:663 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:709 in │
│ forward │
│ │
│ 706 │ │ │ shift_labels = labels[..., 1:].contiguous() │
│ 707 │ │ │ # Flatten the tokens │
│ 708 │ │ │ loss_fct = CrossEntropyLoss() │
│ ❱ 709 │ │ │ shift_logits = shift_logits.view(-1, self.config.vocab_size) │
│ 710 │ │ │ shift_labels = shift_labels.view(-1) │
│ 711 │ │ │ # Enable model parallelism │
│ 712 │ │ │ shift_labels = shift_labels.to(shift_logits.device) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000
0%| | 0/32481 [00:04<?, ?it/s]

@Facico
Copy link
Owner

Facico commented Apr 21, 2023

你可以去跑一下这里面的第三个问题的程序,看看能不能正常输出

@Facico
Copy link
Owner

Facico commented Apr 23, 2023

兄弟,你要是还没解决的话,可以加一下我们在主页上提供的qq群或者discord群

@molyswu
Copy link
Author

molyswu commented Apr 23, 2023

还没有解决这一个问题

@Facico
Copy link
Owner

Facico commented Apr 24, 2023

主要是你描述不清楚你的问题,我很难复现出你的问题

@molyswu
Copy link
Author

molyswu commented Apr 24, 2023

就是跑你程序报一上的错误

@Facico
Copy link
Owner

Facico commented Apr 24, 2023

兄弟,环境、机器、各种库的依赖等等各种因素都不一样的。我们把各种代码配置都贴出来了,你都不能保证完美复现我们的东西,你就描述一个报错信息我要怎么复现你的错误?
就是因为很多人总是问重复的问题,或者不知道怎么问问题,我们都已经把qq群贴出来了。。。
同时我上面建议你跑的测试程序你跑了吗?

@molyswu
Copy link
Author

molyswu commented Apr 26, 2023

我昨天下午,将13B模型训练,TEST_SIZE=200,运行了1天好像没有什么可以跑,不知道为什么7B模型训练,TEST_SIZE=1000,词表越界问题?

@molyswu
Copy link
Author

molyswu commented Apr 26, 2023

13B模型很慢

@Facico
Copy link
Owner

Facico commented Apr 26, 2023

可能是模型自己的问题,llama那边的tokenizer改过好几次,transformers中llama的代码也改过好几次

@molyswu
Copy link
Author

molyswu commented Apr 26, 2023

有可能是初始模型LLaMA-7b文件有问题,现在换了vicuna-7b-delta-v0也可以用了

@Facico
Copy link
Owner

Facico commented Apr 26, 2023

你llama-7b是从huggingface拉去的吗,从huggingface拉去,如果transformers版本和我们差不多的话应该是不会有这个问题的,transformers版本可以4.28.1

@molyswu
Copy link
Author

molyswu commented Apr 26, 2023

是从huggingface拉去的,transformers版本是4.28.0dev0

@molyswu
Copy link
Author

molyswu commented Apr 29, 2023

运行generate.sh后一直报model为NoneType #111 Successfully installed peft-0.3.0.dev0

@molyswu
Copy link
Author

molyswu commented Apr 29, 2023

bash generate.sh
AttributeError: 'NoneType' object has no attribute 'eval'

@molyswu
Copy link
Author

molyswu commented Apr 29, 2023

peft to 0.2.0 ,bash generate.sh, warnings.warn(value)
Running on local URL: http://127.0.0.1:7860

@Facico Facico closed this as completed Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants