Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHID数据集 finetune_chid_large_fp32.sh报错 #21

Closed
YinWei123 opened this issue Mar 2, 2021 · 3 comments
Closed

CHID数据集 finetune_chid_large_fp32.sh报错 #21

YinWei123 opened this issue Mar 2, 2021 · 3 comments

Comments

@YinWei123
Copy link

微调环境:
8 * Tesla V100
cuda 10.0.130
用的是文档中提供的docker镜像

f16脚本跑起来没问题,但是f32报错,信息如下:

Traceback (most recent call last):
File "finetune_chid.py", line 357, in
main()
File "finetune_chid.py", line 241, in main
model, optimizer, lr_scheduler = setup_model_and_optimizer(args)
File "/CPM/new/CPM-Finetune-main/utils.py", line 510, in setup_model_and_optimizer
args.iteration = load_checkpoint(model, optimizer, lr_scheduler, args)
File "/CPM/new/CPM-Finetune-main/utils.py", line 281, in load_checkpoint
checkpoint_name, sd = model.load_checkpoint(args.load, iteration, load_module_strict=False, load_optimizer_states=False, load_lr_scheduler_states=False)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/runtime/engine.py", line 1196, in load_checkpoint
load_lr_scheduler_states=load_lr_scheduler_states)
File "/usr/local/lib/python3.6/dist-packages/deepspeed/runtime/engine.py", line 1231, in _load_checkpoint
self.optimizer.load_state_dict(checkpoint['optimizer'])
File "/usr/local/lib/python3.6/dist-packages/torch/optim/optimizer.py", line 108, in load_state_dict
saved_groups = state_dict['param_groups']
TypeError: 'NoneType' object is not subscriptable

@t1101675
Copy link
Contributor

t1101675 commented Mar 2, 2021

您好,请问您 deepspeed 的版本是否和我们在 README.md 中说明的一致?

@YinWei123
Copy link
Author

您好,请问您 deepspeed 的版本是否和我们在 README.md 中说明的一致?

非常感谢 已经确认是deepspeed 版本问题。

@rockkoca
Copy link

rockkoca commented Apr 8, 2021

您好,请问您 deepspeed 的版本是否和我们在 README.md 中说明的一致?

请问 deepspeed 需要在host主机上安装吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants