-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize gemv for small M, large N only if it can be done in a threadsafe manner #1865
Conversation
merge develop
Looks like includes the observations from mentioned issue. |
Sorry, I did not get the comment ? (Unfortunately I cannot test this on my hexacore system this weekend, but I had tested the general idea outlined in the comments on #1852 earlier) |
It looks that it enables gemv optimisation for OMP only. |
The intention is that it enables the optimization only when the static buffer can be made thread safe, that is either under OMP or when the compiler is capable of assigning C11-like TLS. |
Patches do the trick most likely just that android tablet chroot is not always multiprocessor like a normal computer.... |
Merged after successful local tests |
Thanks. Since this is a critical bug for NumPy, will it be part of a bugfix release? |
Guess 0.3.4 is overdue in any case, will see if I can do it next weekend or at least by the end of this month. Need to try to get a decent handle on the equally ugly #1851 as well. |
commit 8e5a108 tried to improve the performance of #532 but introduced a static array in the process, breaking programs like numpy that call into OpenBLAS from several concurrent threads. Supersedes the simple revert from #1852 with the intention to fix #1844 without sacrificing performance where possible.