Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: Cannot copy out of meta tensor; no data! #83

Closed
greatewei opened this issue Apr 17, 2023 · 5 comments
Closed

NotImplementedError: Cannot copy out of meta tensor; no data! #83

greatewei opened this issue Apr 17, 2023 · 5 comments

Comments

@greatewei
Copy link

想要在原来的lora基础上进行数据增量训练,data.json有3000条左右数据
运行命令:

python finetune.py \
--data_path /data/chat/Chinese-Vicuna/data/data.json \
--output_path /data/chat/models/llama_lora/llama-7b-yy-lora \
--model_path /data/chat/models/llama_base/llama-7b-hf  \
--eval_steps 200 \
--save_steps 200 \
--test_size 1 \
--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco \
--ignore_data_skip True

错误内容:

File "/data/chat/Chinese-Vicuna/finetune.py", line 235, in
trainer = transformers.Trainer(
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 498, in init
self._move_model_to_device(model, args.device)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 740, in _move_model_to_device
model = model.to(device)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

求大佬帮忙看看

@Facico
Copy link
Owner

Facico commented Apr 18, 2023

因为你这个好像是模型加载的问题,你用这里面问题3的代码能正常加载吗?
或者你可以参考一下这个issue

@greatewei
Copy link
Author

因为你这个好像是模型加载的问题,你用这里面问题3的代码能正常加载吗? 或者你可以参考一下这个issue

十分感谢。

image
这个错误解决了。

出现了其他错误:
Traceback (most recent call last):
File "/data/chat/Chinese-Vicuna/finetune.py", line 273, in
trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1659, in train
return inner_training_loop(
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2064, in _inner_training_loop
checkpoints_sorted = self._sorted_checkpoints(use_mtime=False, output_dir=run_dir)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2923, in _sorted_checkpoints
best_model_index = checkpoints_sorted.index(str(Path(self.state.best_model_checkpoint)))
ValueError: 'lora-Vicuna/checkpoint-17000' is not in list

--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco ,这是我的参数,但是我不知道为什么会出现 lora-Vicuna/checkpoint-17000 路径

@Facico
Copy link
Owner

Facico commented Apr 18, 2023

把你加载模型里面的trainer_state.json中best_model_checkpoint这个字段删掉,以为它会和这个字段的模型进行比较,可能你去训练其他数据的时候用不到这个模型的结果

@greatewei
Copy link
Author

把你加载模型里面的trainer_state.json中best_model_checkpoint这个字段删掉,以为它会和这个字段的模型进行比较,可能你去训练其他数据的时候用不到这个模型的结果

解决了

@dj1150277
Copy link

因为你这个好像是模型加载的问题,你用这里面问题3的代码能正常加载吗? 或者你可以参考一下这个issue

十分感谢。

image 这个错误解决了。

出现了其他错误: Traceback (most recent call last): File "/data/chat/Chinese-Vicuna/finetune.py", line 273, in trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1659, in train return inner_training_loop( File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2064, in _inner_training_loop checkpoints_sorted = self._sorted_checkpoints(use_mtime=False, output_dir=run_dir) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2923, in _sorted_checkpoints best_model_index = checkpoints_sorted.index(str(Path(self.state.best_model_checkpoint))) ValueError: 'lora-Vicuna/checkpoint-17000' is not in list

--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco ,这是我的参数,但是我不知道为什么会出现 lora-Vicuna/checkpoint-17000 路径

真的太感谢了,我去至少看了10多个post,最终用这个方法解决了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants