RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 #85

molyswu · 2023-04-18T03:39:50Z

Hi,

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000

请问是什么问题？

Facico · 2023-04-18T03:49:01Z

能给出更详细一点的报错信息吗，不过我猜测你应该是加载了别人的模型？你可以参考一下类似的issue

molyswu · 2023-04-18T05:44:28Z

双卡，RTX3090:

if not args.wandb:
37 os.environ["WANDB_MODE"] = "disable"
38 # optimized for RTX 4090. for larger GPUs, increase some of these?
39 MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
40 BATCH_SIZE = 128
41 MAX_STEPS = None
42 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
43 EPOCHS = 3 # we don't always need 3 tbh
44 LEARNING_RATE = 3e-4 # the Karpathy constant
45 CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
46 LORA_R = 8
47 LORA_ALPHA = 16
48 LORA_DROPOUT = 0.05
49 VAL_SET_SIZE = args.test_size #2000
50 TARGET_MODULES = [
51 "q_proj",
52 "v_proj",
53 ]

Facico · 2023-04-18T06:41:44Z

就是你运行脚本之类的都没改过吗？

molyswu · 2023-04-19T00:51:48Z

没有改，基础模型就是LLAma-7b

Facico · 2023-04-19T04:17:59Z

你只给了这个报错信息，我只能判断你模型使用的tokenzier和我们使用的tokenzier不一致。你可以参考我们issue提问模板进行提问，或者参考一下别人是怎么提问的。
问的太抽象我不能很好的复现出你的问题。

molyswu · 2023-04-19T05:30:08Z

/root/anaconda3/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
0%| | 0/32481 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
./Chinese-Vicuna/finetune.py:271 in │
│ │
│ 268 │
│ 269 print("\n If there's a warning about missing keys above, please disregard :)") │
│ 270 │
│ ❱ 271 trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) │
│ 272 │
│ 273 model.save_pretrained(OUTPUT_DIR) │
│ 274 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1929 in _inner_training_loop │
│ │
│ 1926 │ │ │ │ │ with model.no_sync(): │
│ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1928 │ │ │ │ else: │
│ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1930 │ │ │ │ │
│ 1931 │ │ │ │ if ( │
│ 1932 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2699 in training_step │
│ │
│ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2697 │ │ │
│ 2698 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2700 │ │ │
│ 2701 │ │ if self.args.n_gpu > 1: │
│ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2731 in compute_loss │
│ │
│ 2728 │ │ │ labels = inputs.pop("labels") │
│ 2729 │ │ else: │
│ 2730 │ │ │ labels = None │
│ ❱ 2731 │ │ outputs = model(**inputs) │
│ 2732 │ │ # Save past state if it exists │
│ 2733 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2734 │ │ if self.args.past_index >= 0: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ in forward:663 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:709 in │
│ forward │
│ │
│ 706 │ │ │ shift_labels = labels[..., 1:].contiguous() │
│ 707 │ │ │ # Flatten the tokens │
│ 708 │ │ │ loss_fct = CrossEntropyLoss() │
│ ❱ 709 │ │ │ shift_logits = shift_logits.view(-1, self.config.vocab_size) │
│ 710 │ │ │ shift_labels = shift_labels.view(-1) │
│ 711 │ │ │ # Enable model parallelism │
│ 712 │ │ │ shift_labels = shift_labels.to(shift_logits.device) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000
0%| | 0/32481 [00:04<?, ?it/s]

Facico · 2023-04-21T01:33:47Z

你可以去跑一下这里面的第三个问题的程序，看看能不能正常输出

Facico · 2023-04-23T04:10:51Z

兄弟，你要是还没解决的话，可以加一下我们在主页上提供的qq群或者discord群

molyswu · 2023-04-23T08:08:33Z

还没有解决这一个问题

Facico · 2023-04-24T03:19:23Z

主要是你描述不清楚你的问题，我很难复现出你的问题

molyswu · 2023-04-24T07:46:22Z

就是跑你程序报一上的错误

Facico · 2023-04-24T07:54:07Z

兄弟，环境、机器、各种库的依赖等等各种因素都不一样的。我们把各种代码配置都贴出来了，你都不能保证完美复现我们的东西，你就描述一个报错信息我要怎么复现你的错误？
就是因为很多人总是问重复的问题，或者不知道怎么问问题，我们都已经把qq群贴出来了。。。
同时我上面建议你跑的测试程序你跑了吗？

molyswu · 2023-04-26T02:42:07Z

我昨天下午，将13B模型训练，TEST_SIZE=200，运行了1天好像没有什么可以跑，不知道为什么7B模型训练，TEST_SIZE=1000，词表越界问题？

molyswu · 2023-04-26T02:42:41Z

13B模型很慢

Facico · 2023-04-26T03:14:39Z

可能是模型自己的问题，llama那边的tokenizer改过好几次，transformers中llama的代码也改过好几次

molyswu · 2023-04-26T03:16:05Z

有可能是初始模型LLaMA-7b文件有问题，现在换了vicuna-7b-delta-v0也可以用了

Facico · 2023-04-26T03:22:31Z

你llama-7b是从huggingface拉去的吗，从huggingface拉去，如果transformers版本和我们差不多的话应该是不会有这个问题的，transformers版本可以4.28.1

molyswu · 2023-04-26T08:01:43Z

是从huggingface拉去的，transformers版本是4.28.0dev0

molyswu · 2023-04-29T01:23:54Z

运行generate.sh后一直报model为NoneType #111 Successfully installed peft-0.3.0.dev0

molyswu · 2023-04-29T01:25:47Z

bash generate.sh
AttributeError: 'NoneType' object has no attribute 'eval'

molyswu · 2023-04-29T06:43:37Z

peft to 0.2.0 ,bash generate.sh, warnings.warn(value)
Running on local URL: http://127.0.0.1:7860

Facico closed this as completed Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 #85

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 #85

molyswu commented Apr 18, 2023

Facico commented Apr 18, 2023

molyswu commented Apr 18, 2023

Facico commented Apr 18, 2023

molyswu commented Apr 19, 2023

Facico commented Apr 19, 2023

molyswu commented Apr 19, 2023

Facico commented Apr 21, 2023

Facico commented Apr 23, 2023

molyswu commented Apr 23, 2023

Facico commented Apr 24, 2023

molyswu commented Apr 24, 2023

Facico commented Apr 24, 2023

molyswu commented Apr 26, 2023

molyswu commented Apr 26, 2023

Facico commented Apr 26, 2023

molyswu commented Apr 26, 2023

Facico commented Apr 26, 2023

molyswu commented Apr 26, 2023

molyswu commented Apr 29, 2023

molyswu commented Apr 29, 2023

molyswu commented Apr 29, 2023

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 #85

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 #85

Comments

molyswu commented Apr 18, 2023

Facico commented Apr 18, 2023

molyswu commented Apr 18, 2023

Facico commented Apr 18, 2023

molyswu commented Apr 19, 2023

Facico commented Apr 19, 2023

molyswu commented Apr 19, 2023

Facico commented Apr 21, 2023

Facico commented Apr 23, 2023

molyswu commented Apr 23, 2023

Facico commented Apr 24, 2023

molyswu commented Apr 24, 2023

Facico commented Apr 24, 2023

molyswu commented Apr 26, 2023

molyswu commented Apr 26, 2023

Facico commented Apr 26, 2023

molyswu commented Apr 26, 2023

Facico commented Apr 26, 2023

molyswu commented Apr 26, 2023

molyswu commented Apr 29, 2023

molyswu commented Apr 29, 2023

molyswu commented Apr 29, 2023