-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] make it possible to protect against oversubsription for various threadpool nesting cases #16
Conversation
refactor openmp helpers
c3a64f5
to
5603a7c
Compare
I forced pushed a debug commit 5603a7cand here is the unexpected output.
The prompt number in brackets is the index of the outer loop (4 outer iterations x 4 inner iteration == 16 lines) x 2 (without and with limit). The results are really hard to understand but the inner calls to |
Hum, my debug code is causing the MKL enabled case to deadlock at process shutdown time now... and this test is not even using the BLAS routines. And the nested OMP tests pass with MKL. Here is the output of that test that passes but causes the shutdown to deadlock once all the tests pass... note that libgomp (the inner openmp runtime) is has both 2 and 1 threads when it's limited to 1 and the inner call to omp_get_num_threads always returns 1:
|
@jeremiedbb @tomMoral if you have any idea on what is going on, feel free to pitch in. |
I'm as lost as you are :'(
|
Another remark. Setting openmp max threads for each prefix independently and not setting the prange num_thread param as in first api attempt seems to not work either. from threadpoolctl import threadpool_limits
with threadpool_limits(limits={'libomp': 2, 'libgomp': 1}):
check_nested_openmp_loops(n) has the same issue |
Also, I'm able to reproduce this in pure C, but only if gcc is outer and clang is inner. |
And I can confirm that it's not just a wrong value of |
Might have to do with "Note: Functionality in this module may only be used from the main thread or parallel regions due to OpenMP restrictions. ". https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html |
Maybe you can try to instrument your C code to report https://linux.die.net/man/2/gettid (or we could also do it in the cython version of this PR). |
Let me summarize the state of this. We want to be able to limit number of threads for inner parallel region in nested parallelism. Inner parallelism can be directly using OpenMP or through a BLAS lib.
I think it would be reasonable to skip the tests in the problematic envs temporarily, merge this PR and maybe even make a first release. |
We have not tested BLIS built by either clang/icc or gcc either. |
I do not see any simple way to add a test for the remaining uncovered lines from this PR so I propose to leave that as it is. |
I've tested OpenBLAS built with icc and outer built with gcc. We have 3 libs: libopenblas, libiomp & libgomp |
Good news. Could you please also try with blis compiled with clang or icc? |
I'm on it. Just finished to build and install it :) |
I've tested with BLIS (openmp or pthreads) built with either gcc or icc in an outer OpenMP parallel loop and limiting number of threads for BLIS works fine like with mkl or openblas \o/ |
Great news. Thanks. |
As discussed there #14 (comment) .
For now here is some test infrastructure that highlight the problem with 2 OpenMP enabled Cython extensions.