Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAS threading preferences when building SuiteSparse #432

Closed
ViralBShah opened this issue Oct 8, 2023 · 6 comments
Closed

BLAS threading preferences when building SuiteSparse #432

ViralBShah opened this issue Oct 8, 2023 · 6 comments

Comments

@ViralBShah
Copy link
Contributor

ViralBShah commented Oct 8, 2023

Is there a preferred way to build a BLAS library when linking to SuiteSparse? With OpenBLAS, we notice lots of issues with poor performance when UMFPACK uses a multi-threaded version.

OpenBLAS however offers a version with OpenMP, and I was wondering if there is experience in that working better.

We notice good out of the box performance with MKL, but it is a very heavy platform specific dependency,and we are wondering if there are better out of the box defaults that provide better performance with OpenBLAS.

@ViralBShah ViralBShah changed the title Threading preferences when building SuiteSparse BLAS threading preferences when building SuiteSparse Oct 8, 2023
@DrTimothyAldenDavis
Copy link
Owner

I've had performance issues with OpenBLAS and its multithreading, sometimes leading to a 100x slowdown. MKL works fine. I'm not sure what's causing it; it's an open issue in SuiteSparse. I haven't tried building OpenBLAS myself.

@mmuetzel
Copy link
Contributor

mmuetzel commented Oct 9, 2023

If you are building SuiteSparse with OpenMP (the default if OpenMP is available), you should also use an OpenBLAS library that was built with OpenMP. Otherwise, you might end up with a multitude of the expected number of threads. That could impact performance significantly.
See: https://github.com/OpenMathLib/OpenBLAS/wiki/faq#multi-threaded

Depending on the CPU you are using, it might also be of advantage if you limit the number of OpenMP threads to the number of physical cores of your CPU(s) (as opposed to the number of possible hyperthreads). IIUC, that is because hyperthreads running on the same physical core might overwrite (some of) the CPU caches for each other - leading to repeated transfers between CPU caches and computer memory. That could also have a notable performance impact.
For that, set OMP_NUM_THREADS to the number of physical cores.

@ViralBShah
Copy link
Contributor Author

In Julia, we build OpenBLAS and SuiteSparse with pthreads. I suspect the issue with the slowdown is because of OpenBLAS multi-threading, where for small matmuls, it slows down by using all the available threads, and it is perhaps best to just use single-threaded OpenBLAS when using with UMFPACK.

@DrTimothyAldenDavis
Copy link
Owner

That would slow down UMFPACK, CHOLMOD, and SPQR on large matrices.

Also, in the future, I will be revising SPQR to replace TBB with OpenMP tasking. In that case, I would need to use nested parallelism, where multiple fronts can be factorized in parallel, and each front uses a different number of threads. MKL supports that but OpenBLAS does not.

We also have a new parallel version of UMFPACK, called ParU, which will be added soon to SuiteSparse. That uses the same strategy, and it only works well with the Intel MKL BLAS.

Ideally, there would be a way to use OpenBLAS like this but I'm not sure if that's possible.

@DrTimothyAldenDavis
Copy link
Owner

See also #1

@ViralBShah
Copy link
Contributor Author

I am closing this in favour of #1. Looking forward to ParU and the new SPQR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants