Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many tests fail on Intel Skylake CPUs #24

Open
hmenke opened this issue Apr 6, 2022 · 3 comments
Open

Many tests fail on Intel Skylake CPUs #24

hmenke opened this issue Apr 6, 2022 · 3 comments

Comments

@hmenke
Copy link
Member

hmenke commented Apr 6, 2022

This is more of a heads up because there is nothing you can really do about that, because this is an issue with OpenBLAS.

On Intel Skylake CPUs the numpy.linalg.svd routine is broken and gives wrong output. See also numpy/numpy#13401

As a workaround users can set the architecture for OpenBLAS to something that doesn't exhibit the problem. For the CPU that showed the failing tests using haswell worked.

$ export OPENBLAS_VERBOSE=2
$ python3 maxent_result.py
Core: SkylakeX
Warning: could not identify MPI environment!
Starting serial run at: 2022-04-06 14:15:29.548704
Traceback (most recent call last):
  File "maxent_result.py", line 129, in <module>
    "{} not equal".format(key)
AssertionError: A not equal
$ export OPENBLAS_CORETYPE=haswell
$ python3 maxent_result.py
Core: Haswell
Warning: could not identify MPI environment!
Starting serial run at: 2022-04-06 14:15:57.296254
@the-hampel
Copy link
Member

thank you for pointing this out. I will keep this open until the issue is resolved in openblas.

@hmenke
Copy link
Member Author

hmenke commented May 6, 2022

As far as I understand it this has already been fixed in OpenBLAS. There relevant upstream issues seem to be
OpenMathLib/OpenBLAS#1955
OpenMathLib/OpenBLAS#2029
and the fix is to just disable the AVX512 DGEMM kernel
OpenMathLib/OpenBLAS#2061

Unfortunately, most HPC clusters out there are running decade-old CentOS or whatever ancient distribution and will not have received this update yet and most likely also won't for a few years to come.

@the-hampel
Copy link
Member

Thanks for the update. And it seems to be only fixed in version 0.3.18 (https://github.com/xianyi/OpenBLAS/releases/tag/v0.3.18) if I see this correctly? Ubuntu 20.04 is also not even close to that version (0.3.8), and only the recent Ubuntu 22.04 has a version that should fix this. So I agree it will take quite some time to see this version on HPC machines...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants