Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #2506] Specify multiprocessing context in DataLoader #2507

Merged
merged 1 commit into from
May 2, 2024

Conversation

MengqingCao
Copy link
Contributor

Fix #2506

P.S. The indentation adjustment of the code is to pass the format check of yapf.

@xingchensong xingchensong merged commit 5576e6f into wenet-e2e:main May 2, 2024
6 checks passed
@xingchensong
Copy link
Member

thx!

@xingchensong
Copy link
Member

xingchensong commented May 8, 2024

Sorry , I have to revert this PR, I cannot laungh deepspeed engine after this PR.

xingchensong added a commit that referenced this pull request May 8, 2024
xingchensong added a commit that referenced this pull request May 8, 2024
@MengqingCao
Copy link
Contributor Author

Sorry , I have to revert this PR, I cannot laungh deepspeed engine after this PR.

Hi @xingchensong, could you describe in more detail the error you get when launching deepspeed? If it is solvable, I would like to do this work and remerge this pr if possible.

@xingchensong
Copy link
Member

Support for the KUNPENG CPU is not a top priority; you can treat it as a patch and explore DeepSpeed issue when using an Intel CPU.

@MengqingCao
Copy link
Contributor Author

MengqingCao commented May 8, 2024

Support for the KUNPENG CPU is not a top priority; you can treat it as a patch and explore DeepSpeed issue when using an Intel CPU.

I tried to run training pipeline in aishell/whisper/run.sh with deepspeed, using mpDataLoader as a patch. Nothing went wrong with it. Maybe more details for reproducing the error you met could help.

The cpu test with:

Intel(R) Xeon(R) Gold 6151 CPU @ 3.00GHz

@xingchensong
Copy link
Member

我在使用自己diy代码,抄本在线hotfix (wenet-e2e/WenetSpeech#54 ),deespeed会一直重复进行初始化

https://paste.ubuntu.com/p/4FmD3342Dv/

image

听你描述,应该去掉这段diy代码可以正常跑,没有时间探究原因,你有时间可以尝试解决下

@MengqingCao
Copy link
Contributor Author

MengqingCao commented May 8, 2024

听你描述,应该去掉这段diy代码可以正常跑,没有时间探究原因,你有时间可以尝试解决下

ok 我尝试下

MengqingCao added a commit to MengqingCao/wenet that referenced this pull request May 16, 2024
- fix segmentfault in Kunpeng (wenet-e2e#2506)
- avoids the repeated initialization of deepspeed in (wenet-e2e#2507)
MengqingCao added a commit to MengqingCao/wenet that referenced this pull request May 16, 2024
  - fix segmentfault in Kunpeng (wenet-e2e#2506)
  - avoids the repeated initialization of deepspeed causing by (wenet-e2e#2507)
xingchensong pushed a commit that referenced this pull request May 17, 2024
- fix segmentfault in Kunpeng (#2506)
  - avoids the repeated initialization of deepspeed causing by (#2507)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentfault in multiprocessing DataLoader when training on Kunpeng cpu
2 participants