Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大佬 我在训练的时候遇到这个问题pytorch_lightning.utilities.exceptions.MisconfigurationException: DDPShardedPlugin requires fairscale to be installed. Install it by running pip install fairscale.,但是我明明已经安装好了还是报相同的错,这是啥原因啊,另外想问一下cpu可以跑通么 #64

Open
lichengyang666 opened this issue May 12, 2023 · 7 comments

Comments

@lichengyang666
Copy link

No description provided.

@renmada
Copy link
Owner

renmada commented May 15, 2023

试试改个参数,不用fairscale

--plugins ddp

@satisic
Copy link

satisic commented May 16, 2023

试试改个参数,不用fairscale

--plugins ddp

大佬我也遇到了同款问题,改成ddp后报了这个错误
File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 86, in dev_collate
return self.train_collate(batch)
File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in train_collate
src = [b['src'] for b in batch]
File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in
src = [b['src'] for b in batch]
KeyError: 'src'

@lichengyang666
Copy link
Author

试试改个参数,不用fairscale

--plugins ddp

大佬我也遇到了同款问题,改成ddp后报了这个错误 File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 86, in dev_collate return self.train_collate(batch) File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in train_collate src = [b['src'] for b in batch] File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in src = [b['src'] for b in batch] KeyError: 'src'

你的数据集格式有问题,参考下旧版本里面的example中的数据格式

@satisic
Copy link

satisic commented May 17, 2023

试试改个参数,不用fairscale

--plugins ddp

大佬我也遇到了同款问题,改成ddp后报了这个错误 File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 86, in dev_collate return self.train_collate(batch) File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in train_collate src = [b['src'] for b in batch] File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in src = [b['src'] for b in batch] KeyError: 'src'

你的数据集格式有问题,参考下旧版本里面的example中的数据格式

感谢回复,我更改key名称后数据加载条到一半多线程又报错了
File "C:\ProgramData\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.RLock' object

@lichengyang666
Copy link
Author

试试改个参数,不用fairscale

--plugins ddp

大佬我也遇到了同款问题,改成ddp后报了这个错误 File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 86, in dev_collate return self.train_collate(batch) File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in train_collate src = [b['src'] for b in batch] File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in src = [b['src'] for b in batch] KeyError: 'src'

你的数据集格式有问题,参考下旧版本里面的example中的数据格式

感谢回复,我更改key名称后数据加载条到一半多线程又报错了 File "C:\ProgramData\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle '_thread.RLock' object

num_workers改成0试试

@satisic
Copy link

satisic commented May 18, 2023

试试改个参数,不用fairscale

--plugins ddp

大佬我也遇到了同款问题,改成ddp后报了这个错误 File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 86, in dev_collate return self.train_collate(batch) File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in train_collate src = [b['src'] for b in batch] File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in src = [b['src'] for b in batch] KeyError: 'src'

你的数据集格式有问题,参考下旧版本里面的example中的数据格式

感谢回复,我更改key名称后数据加载条到一半多线程又报错了 File "C:\ProgramData\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle '_thread.RLock' object

num_workers改成0试试

okk了,主机用cpu模式太慢,pycham识别cuda.count()是有卡的,但pycharm连git运行sh文件就认不到卡pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0]
But your machine only has: [](1080ti不支持半精度的原因?还是说git没有认到anaconda的环境?)。

后面转到linux用4090没修改argument文件就能跑通

@WSChange
Copy link

试试改个参数,不用fairscale

--plugins ddp

大佬我也遇到了同款问题,改成ddp后报了这个错误 File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 86, in dev_collate return self.train_collate(batch) File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in train_collate src = [b['src'] for b in batch] File "C:\Users\16771\OneDrive\桌面\t5-pegasus-pytorch-main\utils.py", line 53, in src = [b['src'] for b in batch] KeyError: 'src'

你的数据集格式有问题,参考下旧版本里面的example中的数据格式

感谢回复,我更改key名称后数据加载条到一半多线程又报错了 File "C:\ProgramData\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle '_thread.RLock' object

请问如何修改key的名称,是全部改成旧版本的示例数据的格式吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants