Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize gemv for small M, large N only if it can be done in a threadsafe manner #1865

Merged
merged 2 commits into from
Nov 12, 2018
Merged

Conversation

martin-frbg
Copy link
Collaborator

commit 8e5a108 tried to improve the performance of #532 but introduced a static array in the process, breaking programs like numpy that call into OpenBLAS from several concurrent threads. Supersedes the simple revert from #1852 with the intention to fix #1844 without sacrificing performance where possible.

…safe

otherwise the introduction of a static array in 8e5a108 to improve #532 breaks concurrent calls from multiple threads as seen in #1844
@brada4
Copy link
Contributor

brada4 commented Nov 10, 2018

Looks like includes the observations from mentioned issue.

@martin-frbg
Copy link
Collaborator Author

Sorry, I did not get the comment ? (Unfortunately I cannot test this on my hexacore system this weekend, but I had tested the general idea outlined in the comments on #1852 earlier)

@brada4
Copy link
Contributor

brada4 commented Nov 10, 2018

It looks that it enables gemv optimisation for OMP only.
Does not fail with Android UserLAnd (sort of debian chroot, atom CPU) in OMP or pthread builds, though I cannot tell if it ever failed before in this combination.
EDIT: it fails in pre=patches pthread version and ubuntu numpy, though multiprocessorness of this "debian" is not always guaranteed, you somehow need to spin many bg apps to make cpus wake from park. Does not seem possible from "app" I'd wait for more reliable tests...

@martin-frbg
Copy link
Collaborator Author

The intention is that it enables the optimization only when the static buffer can be made thread safe, that is either under OMP or when the compiler is capable of assigning C11-like TLS.

@brada4
Copy link
Contributor

brada4 commented Nov 11, 2018

Patches do the trick most likely just that android tablet chroot is not always multiprocessor like a normal computer....

@martin-frbg
Copy link
Collaborator Author

Merged after successful local tests

@mattip
Copy link
Contributor

mattip commented Nov 12, 2018

Thanks. Since this is a critical bug for NumPy, will it be part of a bugfix release?

@martin-frbg martin-frbg added this to the 0.3.4 milestone Nov 12, 2018
@martin-frbg
Copy link
Collaborator Author

Guess 0.3.4 is overdue in any case, will see if I can do it next weekend or at least by the end of this month. Need to try to get a decent handle on the equally ugly #1851 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenBLAS threadsafety issues for downstream libraries (NumPy)
3 participants