-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryError after exceeding OpenMP/OpenBLAS thread limit #99
Comments
Hi @lingfeiwang, I'm not sure that I understand how you triggered that. Could you detail a bit more the steps that lead to this broken state ? |
Actually I completely did not expect it to happen and therefore did not record the process to reproduce the error, or the error log itself from OpenMP or OpenBLAS. Briefly, I ran some computation in too many parallel processes where each used OpenMP or OpenBLAS possibly through numpy/scipy, so together it exceeded a certain limit, maybe set by the kernel, and reported the related error lines. I then killed such processes and everything seemed to have recovered, except threadpoolctl which I later discovered. I understand this is super uninformative but trying to reproduce it on a shared computing server would be damaging. I don't know how rare this error appears, but I guess computing servers are constantly tortured on the planet. For me, reboot solved the issue, but someone else might follow up on this thread with more details another day. |
Thanks for the feedback. It might indeed be a bug of the linux kernel or the openmp runtime relying on an incorrectly updated stateful attribute of the system. If that ever happens it would be interesting to start a post-mortem pdb session to introspect the values of the |
Hello. I use threadpoolctl 2.2.0 which runs very well most of the time. However, after exceeding the OpenMP or OpenBLAS thread limit, threadpoolctl seems to have broken down. It does not recover even after the thread-limit-exceeding processes have been killed, or quite some time after that. The full error message of a simple example is shown below. Is there any way to reset threadpoolctl so it continues to function without having to reboot the computer?
The text was updated successfully, but these errors were encountered: