-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentfault in multiprocessing DataLoader when training on Kunpeng cpu #2506
Comments
pr welcome |
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
Apr 28, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
Apr 28, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
Apr 28, 2024
xingchensong
pushed a commit
that referenced
this issue
May 2, 2024
xingchensong
added a commit
that referenced
this issue
May 8, 2024
xingchensong
added a commit
that referenced
this issue
May 8, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
May 15, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
May 16, 2024
- fix segmentfault in Kunpeng (wenet-e2e#2506) - avoids the repeated initialization of deepspeed in (wenet-e2e#2507)
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
May 16, 2024
- fix segmentfault in Kunpeng (wenet-e2e#2506) - avoids the repeated initialization of deepspeed causing by (wenet-e2e#2507)
xingchensong
pushed a commit
that referenced
this issue
May 17, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Segmentfault occurs when the
train.py
is running. It happens when creating the multi-processes in DataLoader.the log:
The stack print out:
To Reproduce
Steps to reproduce the behavior:
cd ./examples/aishell/s0
bash run.sh
When go to stage 4 (run trian.py), the segmentfault will happen.Expected behavior
No fault.
Screenshots
Desktop (please complete the following information):
Additional context
I have confirmed that this error is caused by the way of creating multiple processes. Specifying the multi-process context as
spawn
, just setmultiprocessing_context=mp.get_context("spawn")
in DataLoader, can solve the problem. And as far as I know, the method spawn works on the most systems (Windows, all POSIX platforms and macOS):If this solution is approved, I will submit a PR. Let me know if you have any suggestion.
The text was updated successfully, but these errors were encountered: