-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLAS threading preferences when building SuiteSparse #432
Comments
I've had performance issues with OpenBLAS and its multithreading, sometimes leading to a 100x slowdown. MKL works fine. I'm not sure what's causing it; it's an open issue in SuiteSparse. I haven't tried building OpenBLAS myself. |
If you are building SuiteSparse with OpenMP (the default if OpenMP is available), you should also use an OpenBLAS library that was built with OpenMP. Otherwise, you might end up with a multitude of the expected number of threads. That could impact performance significantly. Depending on the CPU you are using, it might also be of advantage if you limit the number of OpenMP threads to the number of physical cores of your CPU(s) (as opposed to the number of possible hyperthreads). IIUC, that is because hyperthreads running on the same physical core might overwrite (some of) the CPU caches for each other - leading to repeated transfers between CPU caches and computer memory. That could also have a notable performance impact. |
In Julia, we build OpenBLAS and SuiteSparse with pthreads. I suspect the issue with the slowdown is because of OpenBLAS multi-threading, where for small matmuls, it slows down by using all the available threads, and it is perhaps best to just use single-threaded OpenBLAS when using with UMFPACK. |
That would slow down UMFPACK, CHOLMOD, and SPQR on large matrices. Also, in the future, I will be revising SPQR to replace TBB with OpenMP tasking. In that case, I would need to use nested parallelism, where multiple fronts can be factorized in parallel, and each front uses a different number of threads. MKL supports that but OpenBLAS does not. We also have a new parallel version of UMFPACK, called ParU, which will be added soon to SuiteSparse. That uses the same strategy, and it only works well with the Intel MKL BLAS. Ideally, there would be a way to use OpenBLAS like this but I'm not sure if that's possible. |
See also #1 |
I am closing this in favour of #1. Looking forward to ParU and the new SPQR. |
Is there a preferred way to build a BLAS library when linking to SuiteSparse? With OpenBLAS, we notice lots of issues with poor performance when UMFPACK uses a multi-threaded version.
OpenBLAS however offers a version with OpenMP, and I was wondering if there is experience in that working better.
We notice good out of the box performance with MKL, but it is a very heavy platform specific dependency,and we are wondering if there are better out of the box defaults that provide better performance with OpenBLAS.
The text was updated successfully, but these errors were encountered: