-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of threads/cores in parallel calculations #2975
Comments
Thanks for raising this issue @CharlyEmpereurmot, it's a rather interesting one, and I agree really should have an easier solution. Currently, I guess the easiest way to do this would be to export OMP_NUM_THREADS prior to calling python or doing |
There's some discussion in PR #2950 – please make your voice heard! |
Honestly, I don't think the openmp backend is that good, last I checked it wasn't close to twice as fast with two cores. I think this is simply because there's a lot of extra work outside of that region that doesn't get parallelised. I'd sooner deprecate and remove the whole idea and instead invest more into pmda-like ideas, than add more features to "backend". |
I am sure that the OpenMP code could be improved. But I think the real problem here is that pretty much all OpenMP code (in MDA and in numpy – see #2950) slows down when OpenMP uses more threads than physical cores. From my initial tests with import threadpoolctl
import numpy
threadpoolctl.threadpool_info() on every machine, the A sensible start would be to limit OpenMP threads as soon as you do serious work. For MDAnalysis we should include |
@tylerjereddy is there a list of numpy functions that use OpenMP? |
@orbeckst I imagine it would depend on the linear algebra backend in use (usually The only reference to
|
Hi! Related to this, but more on the NumPy side, I recently encountered a weird issue with As for limiting thread numbers, Side note: there are reports saying |
My understanding of your gist was that |
mkl is included in conda so using it is not a problem. However, installation and dependencies are difficult enough so the less we prescribe the better. I would not want to say "you can only use MDA if you use a numpy that is linked against MKL". Besides, if this is not a MDA problem then we shouldn't have to bend over backwards and inconvenience our users. Rather, we should try and have the issue fixed upstream from us. As a short term solution, we could test if OpenMP threads are set to a low performance setting and warn users. (I did find out that most of my students routinely set the Does it make sense to add threadpool limitation to specific pieces of MDAnalysis where we suspect that we can get into performance issues? Or does this just make the code more complicated?? It would be useful to hear what different people (users, developers) think about this issue. |
Agreed, plus this would immediately kill off arm64 & power support (neither of which I believe are supported, with mkl being x86 specific). There's also a lot of talk about MKL being not so optimized on AMD chips...
Going to be a little bit controversial and say that personally, I'd be wary of including a warning here. I feel like a user warning that will get triggered all the time on most workstations is just another thing that will push users away from properly reading warnings. Essentially I think that maybe MDA warnings should be primarily reserved for assumptions and behaviours that MDA does that could lead to erroneous results, and whilst poor performance is annoying, it doesn't necessarily fit in that category? That being said, I'm somewhat curious as to how we'd implement such a warning, I guess psutil?
My vote would be more for this. My understanding of @yuxuanzhuang's benchmarking is that transformation code is always faster when executed serially? In that case using a context manager for all of that would make sense (note; we probably should test this own on a low clock rate CPU and see if the benchmarks hold up, I'm currently running a 1.8 GHz boost disabled mobile chip, so I can probably try it out this weekend if needed).
More related to @CharlyEmpereurmot's original post here, setting OMP_NUM_THREADS to cores per task is usually the recommended way of running any numpy-centric code on clusters, especially if you end up sharing nodes. Although having done my fair share of sysadmin work, I realise that users following this isn't always the case. Given @richardjgowers' distopia code and the poor performance of the existing OpenMP C code, I wouldn't be against just getting rid of that.. but it won't fix the numpy side of things. |
@orbeckst mkl is only in anaconda's channel (the default channel) not conda-forge. So most people installing via this route don't have mkl. I also don't think that trying to get specific about a numpy backend will end well. |
Hello all,
I think it would be awesome to be able to limit the number of threads/cores used in a number of different functions calls. For example, while calculating bonds I would like to be able to do this:
While atm when using
mda_backend = 'OpenMP'
all threads of the machine will be used, and usingmda_backend = 'serial'
will use a single thread. This can be annoying for executing code on clusters for example, or for making use of MDAnalysis for developing user-friendly tools.It would be nice if
calc_bonds
,calc_angles
,calc_dihedrals
could have an argumentnb_threads
, but even better if all functions that are parallelized could have an argument like this, that goes together withbackend
.Please correct me if I'm missing something, but I believe atm it's not possible to limit the number of threads easily from within the python code.
The text was updated successfully, but these errors were encountered: