Programmatically Control Thread Count #188

jbteves · 2019-01-18T21:23:33Z

Summary

Currently tedana uses all available cores or just one (#215). Users should be allowed to control more precisely the number of threads.

Goals

Determine a reasonable number of default threads
Implement changes to set that as a default
Determine whether users should be able to set a number of threads
Make number of threads a user-accessible option in CLI

Strategy

Investigate methods of importing user arguments before NumPy is imported. This appears to be surprisingly difficult.

Additional/Summary

Discovered this problem when I redlined my image processing server, triggering angry e-mails from IT. PR #215 prevents tedana from blowing up clusters but uses a hardcoded one-core limit via configuration file. (Thanks @tsalo!)

tsalo · 2019-01-19T16:47:14Z

Optimally, the default should be to use all available cores, and users should be able to control it with an n_threads argument. That said, I've been having a fair amount of trouble controlling this within tedana.

We never explicitly control multithreading in tedana, so I believe that the best approach is to control the environment variables. This SO post provides a list of variables we should set.

Setting the following in the terminal before calling tedana successfully limits the number of CPUs used on my Mac:

export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
export VECLIB_MAXIMUM_THREADS=1

Unfortunately, setting these values within the workflow (with os.environ) so far only works if at the top of the file, which means we couldn't set them as an argument. There might be a way around that, but supposedly it should only work if the modules that use these variables (esp. numpy) are imported after setting the variables. However, even when I move imports of all non-builtin modules (except argparse) to below where the environment variables are set it doesn't work.

One interesting note though- on one core the ICA step takes a lot less time than using all available cores, although the results are somewhat different. We'll need to take a look at results to make sure they're still valid, but it's promising.

jbteves · 2019-01-26T15:16:38Z

@tsalo I was re-reading the issues and I noticed that we have another anecdote of slow performance on HPC systems here in #141
This makes me wonder if we should include figuring out what's going on with the multithreaded ICA as part of reliability analysis, thoughts?

tsalo · 2019-01-26T15:47:16Z

It's hard to tell from the issue, but I looked at the version they use in the code they post and it looks like that issue relates to back when tedana used mdp instead of sklearn. mdp just happened to take longer than sklearn to get through the 5000 iterations. I think the issue itself is outdated and the only reason we haven't closed it is because the original author never responded.

That said, I think you have a great point. Since OS is within the scope of the reliability analysis, it might be worth it to include parallelization as well. We definitely don't want to make it the focus of the analysis, so running across a range of numbers of cores is unnecessary, but a comparison of "one core" vs "all available" would be great. After all, it could be that parallelization speeds things up generally, but not in the one case I tried out.

We could also profile the code to see if parallelizing speeds up ICA but slows down other elements. This is purely conjectural but it seemed like the spatial clustering took longer when I limited tedana to one core.

jbteves · 2019-01-26T19:37:28Z

Well, I don't know about "all available," since on my server that would be 36 cores and probably violates a ton of assumptions typically made. But I think 1, 4, and 8 should tell us a lot, or even just 1 and 4.

tsalo · 2019-01-26T19:51:50Z

By "all available" I meant more that it would use whatever resources are allocated to the job. If you use a scheduler, you should be able to set the number of cores you want to reserve to run the job. We just need to make sure that tedana respects that value. I believe that right now it will use any cores it can access, which we definitely do not want. For the analyses, 1 and 4 seem reasonable to me.

jbteves · 2019-05-23T20:44:28Z

This seems to be a shockingly difficult problem from reading the open NumPy issue, where a large discussion basically leads me to conclude that we are hopelessly downstream of the problem. For a quick summary, it appears that the issue rests in the BLAS libraries that NumPy may or may not use as they are system and even CPU dependent (yikes). Furthermore, the NumPy commenters appear to believe that this is not a priority for them or their users, reducing the likelihood that this problem will be solved in the near future. As much as this frustrates me, I have to conclude that we should stick with @tsalo's solution of reducing default core count to 1, and hoping that some day this will be fixed upstream of us.

tsalo added the discussion issues that still need to be discussed label Jan 19, 2019

tsalo changed the title ~~[Discussion] Reduce Multithreading~~ Reduce Multithreading Jan 19, 2019

tsalo added this to the method extensions & improvements milestone Feb 9, 2019

tsalo mentioned this issue Feb 9, 2019

[ENH] Limit tedana to one core #215

Merged

tsalo added enhancement issues describing possible enhancements to the project and removed discussion issues that still need to be discussed labels Feb 11, 2019

jbteves changed the title ~~Reduce Multithreading~~ Programmatically Control Thread Count Feb 11, 2019

jbteves mentioned this issue Apr 20, 2019

TypeError problem from monoexponential.py #141

Closed

jbteves closed this as completed May 23, 2019

Aaronearlerichardson mentioned this issue Nov 20, 2019

importing tedana in python seems to lock my multiprocessing to one core #473

Closed

This was referenced Feb 21, 2020

More elegant multithreading handling #401

Closed

[ENH] Control number of threads with threadpoolctl #537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Programmatically Control Thread Count #188

Programmatically Control Thread Count #188

jbteves commented Jan 18, 2019 •

edited

Loading

tsalo commented Jan 19, 2019

jbteves commented Jan 26, 2019

tsalo commented Jan 26, 2019

jbteves commented Jan 26, 2019

tsalo commented Jan 26, 2019

jbteves commented May 23, 2019

Programmatically Control Thread Count #188

Programmatically Control Thread Count #188

Comments

jbteves commented Jan 18, 2019 • edited Loading

Summary

Goals

Strategy

Additional/Summary

tsalo commented Jan 19, 2019

jbteves commented Jan 26, 2019

tsalo commented Jan 26, 2019

jbteves commented Jan 26, 2019

tsalo commented Jan 26, 2019

jbteves commented May 23, 2019

jbteves commented Jan 18, 2019 •

edited

Loading