Revert "Lazyly reinit threads after a fork in OMP mode" #2982

jonringer · 2020-11-11T01:34:34Z

This reverts commit 3094fc6.

Causes seg faults in some scenarios, see #2970 (comment)

This reverts commit 3094fc6.

martin-frbg · 2020-11-11T07:53:34Z

Ah, pity - possibly depends on the exact version of libgomp if it works or not @Flamefire

Flamefire · 2020-11-11T08:18:03Z

Hm, that is strange. Before actually reverting this I'd like to understand why this is happening.
For this situation to occur a fork must have happened earlier. That would be the first thing to check although I'd just assume it.
Then IIUC you said this is because the code is not thread safe. So that means exec_blas is run by multiple threads concurrently. Is that true? If so, how?
Then I think it would be better to make that thread safe than to penalize all future executions. The simplest approach might be:

if (unlikely(blas_server_avail == 0)){
  #pragma omp critical
  {
    #pragma omp flush(blas_server_avail)
    if (unlikely(blas_server_avail == 0)) blas_thread_init();
  }
}

Not sure if the flush is needed or if there even needs to be another one inside the if, after the init call. And also not sure if the omp critical works when threads are spawned outside of OMP (hence my question why this is run by multiple threads), but I expect that to be the case.

jonringer · 2020-11-11T18:29:09Z

This is really outside my area of expertise.

Using only one "core" to build and run tests still seg faults, so my assumption around thread safety is likely wrong. But it seemed like the most plausible explanation to me at the time. There might be something else going on at play, but like I said, I'm not super familiar with this domain. I'm just trying to package software.

Do you know what version of libgomp you used? I'm using gcc 9.3.0 in this example. It might be specific to the version of libgomp I'm using.

jonringer · 2020-11-12T07:55:29Z

still failing:

$ git diff
diff --git a/driver/others/blas_server_omp.c b/driver/others/blas_server_omp.c
index a8b3e9a4..600e75bb 100644
--- a/driver/others/blas_server_omp.c
+++ b/driver/others/blas_server_omp.c
@@ -376,8 +376,14 @@ fprintf(stderr,"UNHANDLED COMPLEX\n");

 int exec_blas(BLASLONG num, blas_queue_t *queue){

-  // Handle lazy re-init of the thread-pool after a POSIX fork
-  if (unlikely(blas_server_avail == 0)) blas_thread_init();
+  if (unlikely(blas_server_avail == 0)){
+    #pragma omp critical
+    {
+      #pragma omp flush(blas_server_avail)
+      if (unlikely(blas_server_avail == 0)) blas_thread_init();
+    }
+  }
+

   BLASLONG i, buf_index;

../core/tests/test_multiarray.py::TestDot::test_all <- ../../nix/store/71yxinqs8gd0zk93agvja8g1nmk0jgr0-python3.8-numpy-1.19.4/lib/python3.8/site-packages/numpy/core/tests/test_multiarray.py PASSED [ 18%]
../core/tests/test_multiarray.py::TestDot::test_vecobject <- ../../nix/store/71yxinqs8gd0zk93agvja8g1nmk0jgr0-python3.8-numpy-1.19.4/lib/python3.8/site-packages/numpy/core/tests/test_multiarray.py PASSED [ 18%]
../core/tests/test_multiarray.py::TestDot::test_dot_2args <- ../../nix/store/71yxinqs8gd0zk93agvja8g1nmk0jgr0-python3.8-numpy-1.19.4/lib/python3.8/site-packages/numpy/core/tests/test_multiarray.py PASSED [ 18%]
../core/tests/test_multiarray.py::TestDot::test_dot_3args <- ../../nix/store/71yxinqs8gd0zk93agvja8g1nmk0jgr0-python3.8-numpy-1.19.4/lib/python3.8/site-packages/numpy/core/tests/test_multiarray.py Fatal Python error: Segmentation fault

Thread 0x00007ffff783ff80 (most recent call first):
  File "<__array_function__ internals>", line 5 in dot
  File "/nix/store/71yxinqs8gd0zk93agvja8g1nmk0jgr0-python3.8-numpy-1.19.4/lib/python3.8/site-packages/numpy/core/tests/test_multiarray.py", line 5811 in test_dot_3args
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/python.py", line 182 in pytest_pyfunc_call
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/python.py", line 1477 in runtest
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/runner.py", line 135 in pytest_runtest_call
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/runner.py", line 217 in <lambda>
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/runner.py", line 244 in from_call
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/runner.py", line 216 in call_runtest_hook
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/runner.py", line 186 in call_and_report
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/runner.py", line 100 in runtestprotocol
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/runner.py", line 85 in pytest_runtest_protocol
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/main.py", line 272 in pytest_runtestloop
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/main.py", line 247 in _main
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/main.py", line 191 in wrap_session
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/main.py", line 240 in pytest_cmdline_main
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/nix/store/ln231rdy72rb4nd46c54yn4iaknb97j1-python3.8-pluggy-0.13.1/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/nix/store/996kx9yv5gdgja728hd50ifm2dnhk9yg-python3.8-pytest-5.4.3/lib/python3.8/site-packages/_pytest/config/__init__.py", line 124 in main
  File "/nix/store/71yxinqs8gd0zk93agvja8g1nmk0jgr0-python3.8-numpy-1.19.4/lib/python3.8/site-packages/numpy/_pytesttester.py", line 206 in __call__
  File "<string>", line 1 in <module>
/nix/store/77983lbcimy5h6rqhfq6hvvif4ngmsak-stdenv-linux/setup: line 1303:  3450 Segmentation fault      (core dumped) /nix/store/wfd59kag54hby8kv3f2k9sgasa001qkx-python3-3.8.6/bin/python3.8 -c 'import numpy; numpy.test("fast", verbose=10)'
builder for '/nix/store/bnzlbyzz2kfhic3n1vszl4z4rzz6dacy-python3.8-numpy-1.19.4.drv' failed with exit code 139
error: build of '/nix/store/bnzlbyzz2kfhic3n1vszl4z4rzz6dacy-python3.8-numpy-1.19.4.drv' failed

Flamefire · 2020-11-12T10:46:58Z

Then I guess this needs further work as I suspect a larger issue here. Could you share the test you run? I.e. is it possible to test this by installing a numpy from pypi, install an openblas, run a test which fails, install a patched openblas, run the same test and have it pass?

jonringer · 2020-11-13T17:22:42Z

The problem seems to be that this only appears for some maintainers, as mentioned on the original thread, many others were able to build the package and run the tests. This may be a subtle error which exists with zen2 architectures.

I would open to allowing someone to have ssh access to my server as an unprivileged user. Nixpkgs allows for unprivileged users to install packages. And I can setup an environment where the changes can be verified with just running nix-build -A python3Packages.numpy

martin-frbg · 2020-11-14T17:58:26Z

@jonringer can you please provide the build options that Nixpkgs uses for the OpenBLAS build that failed the numpy tests ?

jonringer · 2020-11-14T20:10:15Z

$ nix eval -f default.nix openblas.makeFlags
[ "BINARY=64" "CC=cc" "CROSS=0" "DYNAMIC_ARCH=1" "FC=gfortran" "HOSTCC=cc" "INTERFACE64=1" "NO_AVX512=1" "NO_BINARY_MODE=" "NO_SHARED=0" "NO_STATIC=1" "NUM_THREADS=64" "PREFIX=/1rz4g4znpzjwh1xymhjpm42vipw92pr73vdgl6xs1hycac8kf2n9" "TARGET=ATHLON" "USE_OPENMP=1" ]

Full info, but not meant to be human readable:

openblas.drv

$ nix show-derivation $(nix-instantiate -A openblas)
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
{
  "/nix/store/ldwvyq008gfrparpgbwrck68qpwqgpx2-openblas-0.3.12.drv": {
    "outputs": {
      "dev": {
        "path": "/nix/store/0sym4nmbi11g1h5s13biz9cwjvjhp40q-openblas-0.3.12-dev"
      },
      "out": {
        "path": "/nix/store/7z4a1fmqpnawza19x9xjrv8drkfzvcvn-openblas-0.3.12"
      }
    },
    "inputSrcs": [
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "inputDrvs": {
      "/nix/store/0zvrnkv6id4qhdk9yfhz53y13sj4czpd-gcc-wrapper-9.3.0.drv": [
        "out"
      ],
      "/nix/store/5grqm6in8vwd9g4mlhh7ac9d083hxjlz-gfortran-wrapper-9.3.0.drv": [
        "out"
      ],
      "/nix/store/bjrd5biij6yxcyp3ihx78q4d3qfc698p-source.drv": [
        "out"
      ],
      "/nix/store/in4iwy53v9rj5wwdy2dihb9lg2za9zy2-bash-4.4-p23.drv": [
        "out"
      ],
      "/nix/store/kxhpn2r3ph78ay6hwjxkvlkz5vdh3hni-perl-5.32.0.drv": [
        "out"
      ],
      "/nix/store/m0vqaflq37wssgh40wdijk1jfn7414cr-which-2.21.drv": [
        "out"
      ],
      "/nix/store/s5l7hky1aiyr5k1lhr0plrfv49xajnzr-stdenv-linux.drv": [
        "out"
      ]
    },
    "platform": "x86_64-linux",
    "builder": "/nix/store/pmrhk324fkidrm5ffd5jckb21s9zys6r-bash-4.4-p23/bin/bash",
    "args": [
      "-e",
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "env": {
      "NIX_HARDENING_ENABLE": "fortify format",
      "blas64": "1",
      "buildInputs": "",
      "builder": "/nix/store/pmrhk324fkidrm5ffd5jckb21s9zys6r-bash-4.4-p23/bin/bash",
      "checkTarget": "tests",
      "configureFlags": "",
      "depsBuildBuild": "/nix/store/10h0hfgz39h0wk76dgy2a0ajgxzqyfxg-gfortran-wrapper-9.3.0 /nix/store/lfl1davviy5mai3g80mpy2gkb97rh9j5-gcc-wrapper-9.3.0",
      "depsBuildBuildPropagated": "",
      "depsBuildTarget": "",
      "depsBuildTargetPropagated": "",
      "depsHostHost": "",
      "depsHostHostPropagated": "",
      "depsTargetTarget": "",
      "depsTargetTargetPropagated": "",
      "dev": "/nix/store/0sym4nmbi11g1h5s13biz9cwjvjhp40q-openblas-0.3.12-dev",
      "doCheck": "1",
      "doInstallCheck": "",
      "hardeningDisable": "stackprotector pic strictoverflow relro bindnow",
      "makeFlags": "BINARY=64 CC=cc CROSS=0 DYNAMIC_ARCH=1 FC=gfortran HOSTCC=cc INTERFACE64=1 NO_AVX512=1 NO_BINARY_MODE= NO_SHARED=0 NO_STATIC=1 NUM_THREADS=64 PREFIX=/1rz4g4znpzjwh1xymhjpm42vipw92pr73vdgl6xs1hycac8kf2n9 TARGET=ATHLON USE_OPENMP=1",
      "name": "openblas-0.3.12",
      "nativeBuildInputs": "/nix/store/97b3dhba8nxz9ss8wfckmq0nz82fmqh5-perl-5.32.0 /nix/store/9vi62ib3qzj4cxb4h9sj4ydcmddgckba-which-2.21",
      "out": "/nix/store/7z4a1fmqpnawza19x9xjrv8drkfzvcvn-openblas-0.3.12",
      "outputs": "out dev",
      "patches": "",
      "pname": "openblas",
      "postInstall": "    # Write pkgconfig aliases. Upstream report:\n    # https://github.com/xianyi/OpenBLAS/issues/1740\n    for alias in blas cblas lapack; do\n      cat <<EOF > $out/lib/pkgconfig/$alias.pc\nName: $alias\nVersion: 0.3.12\nDescription: $alias provided by the OpenBLAS package.\nCflags: -I$out/include\nLibs: -L$out/lib -lopenblas\nEOF\n    done\n\n    # Setup symlinks for blas / lapack\n    ln -s $out/lib/libopenblas.so $out/lib/libblas.so\n    ln -s $out/lib/libopenblas.so $out/lib/libcblas.so\n    ln -s $out/lib/libopenblas.so $out/lib/liblapack.so\n    ln -s $out/lib/libopenblas.so $out/lib/liblapacke.so\nln -s $out/lib/libopenblas.so $out/lib/libblas.so.3\nln -s $out/lib/libopenblas.so $out/lib/libcblas.so.3\nln -s $out/lib/libopenblas.so $out/lib/liblapack.so.3\nln -s $out/lib/libopenblas.so $out/lib/liblapacke.so.3\n",
      "propagatedBuildInputs": "",
      "propagatedNativeBuildInputs": "",
      "src": "/nix/store/3g2xlgi6s6nd465rfba44wrvhlxmmz0y-source",
      "stdenv": "/nix/store/8qagiljbmjs1m6ndchl7b2h9b2vvcx7x-stdenv-linux",
      "strictDeps": "",
      "system": "x86_64-linux",
      "version": "0.3.12"
    }
  }
}

martin-frbg · 2020-11-14T20:24:22Z

Thx. BTW "TARGET=ATHLON" looks like an unusual choice for a 64bit build

jonringer · 2020-11-15T00:11:00Z

It's most likely doing that to support as many cpu architectures as possible.

martin-frbg · 2020-11-15T13:32:38Z

Wonder why you set "NO_BINARY_MODE=" which is not really supposed to be user-modified, but again this probably has no bearing on the (as yet unreproduced) problem.

jonringer · 2020-11-15T17:11:51Z

It's set for previous aarch builds, may not be relevant anymore https://github.com/NixOS/nixpkgs/blob/81bd63045480df99fc3446ab9bba58517fd8e3cf/pkgs/development/libraries/science/math/openblas/default.nix#L161

jonringer · 2020-11-15T19:18:13Z

buffersize to 20 also doesn't work:

$ git diff
diff --git a/common_x86_64.h b/common_x86_64.h
index b813336c..41d9b5b7 100644
--- a/common_x86_64.h
+++ b/common_x86_64.h
@@ -251,7 +251,7 @@ static __inline unsigned int blas_quickdivide(unsigned int x, unsigned int y){
 #define HUGE_PAGESIZE  ( 2 << 20)

 #ifndef BUFFERSIZE
-#define BUFFER_SIZE    (32 << 22)
+#define BUFFER_SIZE    (32 << 20)
 #else
 #define BUFFER_SIZE    (32 << BUFFERSIZE)
 #endif

$ nix-build -A python3Packages.numpy
....
__.py", line 124 in main
  File "/nix/store/p3k94wfs2mdh3b10r5bj5i9bi7wvixs9-python3.8-numpy-1.19.4/lib/python3.8/site-packages/numpy/_pytesttester.py", line 206 in __call__
  File "<string>", line 1 in <module>
/nix/store/frh7ir9rcv19b3ym33ck64s813yr3xrr-stdenv-linux/setup: line 1303:  3450 Segmentation fault      (core dumped) /nix/store/18656kvqazm74bj7k3mdkwmdlqfyf581-python3-3.8.6/bin/python3.8 -c 'import numpy; numpy.test("fast", verbose=10)'
builder for '/nix/store/l5ds3s3b35shmsl897l9gvj0lzyki3jc-python3.8-numpy-1.19.4.drv' failed with exit code 139
error: build of '/nix/store/l5ds3s3b35shmsl897l9gvj0lzyki3jc-python3.8-numpy-1.19.4.drv' failed

martin-frbg · 2020-11-15T20:40:21Z

Interesting, thanks.Wonder if you could do an OpenBLAS build with DEBUG=1 (effectively just adding -g to its CFLAGS) to get a somewhat more meaningful backtrace ?

jonringer · 2020-11-15T21:45:50Z

Compiled openblas and numpy with debuging symbols, and did a test run:

[Switching to Thread 0x7ffebba6e640 (LWP 104473)]
0x00007fffe95cbe70 in dgemm_itcopy_ZEN () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
(gdb) backtrace
#0  0x00007fffe95cbe70 in dgemm_itcopy_ZEN () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#1  0x00007fffe882416a in inner_thread () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#2  0x00007fffe8956bf6 in exec_blas._omp_fn () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#3  0x00007fffe347f916 in ?? () from /nix/store/f23sq7lk6xfrvz467ffkpzackyk5q8dm-gfortran-9.3.0-lib/lib/libgomp.so.1
#4  0x00007ffff7c20e9e in start_thread () from /nix/store/5didcr1sjp2rlx8abbzv92rgahsarqd9-glibc-2.32/lib/libpthread.so.0
#5  0x00007ffff793866f in clone () from /nix/store/5didcr1sjp2rlx8abbzv92rgahsarqd9-glibc-2.32/lib/libc.so.6

I tried adding -g to CFLAGS for openblas, but it seems to not print line numbers :(

$ nix eval -f default.nix openblas.makeFlags
[ "BINARY=64" "CC=cc" "CROSS=0" "DYNAMIC_ARCH=1" "FC=gfortran" "HOSTCC=cc" "INTERFACE64=1" "NO_AVX512=1" "NO_BINARY_MODE=" "NO_SHARED=0" "NO_STATIC=1" "NUM_THREADS=64" "PREFIX=/1rz4g4znpzjwh1xymhjpm42vipw92pr73vdgl6xs1hycac8kf2n9" "TARGET=ATHLON" "USE_OPENMP=1" "DEBUG=1" ]
$ nix eval -f default.nix openblas.dontStrip
true
$ nix eval -f default.nix openblas.NIX_CFLAGS_COMPILE
[ "-g" ]

martin-frbg · 2020-11-17T10:52:46Z

Strange that it would not show line numbers (are you sure numpy picked up the intended build of libopenblas ?), but what is visible now looks like a generic case of trashing the stack (gemm_itcopy_ZEN is the generic C gemm_tcopy_4 kernel so no fancy assembly there - probably its "b" argument is garbage on entry). Annoyingly this is/was also one of the manifestations of a too small BUFFERSIZE... and I take it that the crash is no longer shown as happening "inside libgomp", now that libopenblas is
somewhat debuggable ?

martin-frbg · 2020-11-17T10:55:04Z

Uh, another potential gotcha - you are building with NUM_THREADS=64, but your TR testbed probably has HT so 128 cores seen at runtime ?

Flamefire · 2020-11-17T10:55:14Z

I did a few tests.

Using INTERFACE64=1 makes numpy crash during the test inside a copy kernel from OpenBLAS (likely what you got @jonringer)

I did some experiments with OpenBLAS 0.3.12 and Numpy 1.19.4 and can't get it to segfault on a 12 core Intel system and a 256 core AMD system. Test script:

rm -rf /tmp/tmpinstall/software/OpenBLAS/0.3.7-GCC-8.3.0/*
rm -rf /tmp/install_pt/lib/python3.7/site-packages/numpy*

cd /tmp/OpenBLAS
git clean -fxd

VARS=("" install)
for i in "${VARS[@]}"; do
  make -j$(nproc) "BINARY=64" "CC=gcc" "CROSS=0" "DYNAMIC_ARCH=1" "FC=gfortran" "HOSTCC=gcc" "NO_AVX512=1" "NO_BINARY_MODE=" "NO_SHARED=0" "NO_STATIC=1" "NUM_THREADS=256" "PREFIX=/tmp/tmpinstall/software/OpenBLAS/0.3.7-GCC-8.3.0" TARGET=ATHLON "USE_OPENMP=1" DEBUG=1 $i
done

cd /tmp/numpy
git clean -fxd
python setup.py build -j 4 install --prefix /tmp/install_pt
cd /tmp/OpenBLAS

python -c 'import numpy; numpy.test(verbose=2)'

Maybe it is related to the 64 bit build? How does your site.cfg for the numpy build look like?

martin-frbg · 2020-11-17T11:06:04Z

Hmm. I did not see crashes with INTERFACE64=1 and the corresponding settings in python's site.cfg (and environment variables for the numpy build as mentioned in the comments there) for either suffixed or non-suffixed symbols. Cannot go beyond 12 threads in my local testing right now though.

Flamefire · 2020-11-17T13:39:36Z

For the non-suffixed build (see above but with INTERFACE64=1) I uncommented

  [openblas_ilp64]
  libraries = openblas

and set NPY_USE_BLAS_ILP64=1 NPY_BLAS_ILP64_ORDER=openblas_ilp64 NPY_LAPACK_ILP64_ORDER=openblas_ilp64 before the numpy build. This works.

Without the numpy modification I get the crash. For numpy 1.17 it crashes in dcopy_k_ZEN () at ../kernel/x86_64/copy_sse2.S:592 for 1.19.4 in cgemv_kernel_4x2 at ../kernel/x86_64/cgemv_t_microk_haswell-4.c:262

martin-frbg · 2020-11-17T14:15:07Z

The "numpy modification" being what exactly - reducing BUFFERSIZE or something else ? (Note you may also need to set up the openblas includes in site.cfg to point to the INTERFACE64 version of the installed build

Flamefire · 2020-11-17T14:20:58Z

The "numpy modification" being what exactly

Setting the variables for the numpy build to use the ILP64 mode. (Include and library paths are set up, just didn't show those)

jonringer · 2020-11-17T14:45:13Z

one other difference is that I'm using gcc 9.3.0

Uh, another potential gotcha - you are building with NUM_THREADS=64, but your TR testbed probably has HT so 128 cores seen at runtime ?

This seems to be my issue..... which is odd, as these builds use cgroups to limit the number of cores available (I have mine limited to 32 cores on a single build).

I was able to run the numpy tests with just bumping the NUM_THREADS to 128. Is there a penalty to having the NUM_THREADS significantly higher than the actual core count? (I would assume that you may get hit with significant context switching at some point?)

jonringer · 2020-11-17T14:59:08Z

According to https://github.com/xianyi/OpenBLAS/blob/develop/USAGE.md#troubleshooting , it looks like i should be able to bump the threads up to 256.

Or, I could limit the cores at runtime with some environment variables.

martin-frbg · 2020-11-17T15:15:40Z

@Flamefire ok that's expected and unavoidable
@johringer could be that cgroup is ignored when building with the default NO_AFFINITY=1. Potential penalty from high NUM_THREADS is some memory overhead for structs allocated at compile time, but actual number of threads at runtime is capped at what is physically available on the host.

jonringer · 2020-11-17T15:21:20Z

could be that cgroup is ignored when building with the default NO_AFFINITY=1.

I think this is expected. This package will be distributed to host with different hardware specs.

Potential penalty from high NUM_THREADS is some memory overhead for structs allocated at compile time

This is what I was afraid of. Don't want to do the overhead of 256 cores, for it to run on a machine with only 16 cores...

I'll think of something

jonringer · 2020-11-17T15:28:21Z

Conclusion is if (unlikely(blas_server_avail == 0)) blas_thread_init(); seems to be correct, and the way we (nixpkgs) were packaging this only worked on systems with <=64 cores. However, the runtime behavior didn't express this broken behavior until it was added.

I will close the thread as reverting the commit doesn't seem to be the correct course of action.

However, it might be nice to emit some warning when openblas has less thread capacity than the host machine, and limit it's thread usage accordingly.

martin-frbg · 2020-11-17T15:28:41Z

Runtime overhead of unused entries should be around 80 bytes or so each iiRC.

jonringer · 2020-11-17T15:29:31Z

Runtime overhead of unused entries should be around 80 bytes or so each iiRC.

oh, that's a pretty low bar for most x86_64 machines

Flamefire · 2020-11-17T15:39:48Z

Runtime overhead of unused entries should be around 80 bytes or so each iiRC.

Many function allocate stack memory arrays, the biggest being a blas_queue_t array where each element is of size ~168 Bytes. There might be also a job_t array where each of the NUM_THREADS entries has a size of NUM_THREADS * ~16 Bytes, so quadratic memory requirement. 43kB of Stack memory for the queues is a lot (for 256 max threads).

@jonringer Can you open an issue referencing this? IMO setting the NUM_THREADS to low must not lead to a crash. So it should work for your current setting. Or am I missing anything?

Edit: Another side effect of setting NUM_THREADS: It affects/is related to the number of OpenMP threads that are used by the calling program #2985

jonringer · 2020-11-17T15:57:27Z

I opened an issue to continue this discussion. But this PR doesn't seem relevant anymore.

Sorry for being a bit presumptuous with causation, but I was just following git bisect. And was able to reproduce the issue consistently.

Revert "Lazyly reinit threads after a fork in OMP mode"

7c71f94

This reverts commit 3094fc6.

jonringer mentioned this pull request Nov 11, 2020

[staging] openblas: fix seg fault on some architectures NixOS/nixpkgs#103379

Closed

10 tasks

martin-frbg mentioned this pull request Nov 15, 2020

maybe a problem with 0.3.12 and NumPy? #2970

Closed

jonringer closed this Nov 17, 2020

jonringer deleted the revert-309 branch November 17, 2020 15:28

jonringer mentioned this pull request Nov 17, 2020

Segfault when NUM_THREADS is smaller than host vCPUs #2993

Closed

jonringer mentioned this pull request Nov 17, 2020

python3Packages.numpy: fix tests for >64 core machines NixOS/nixpkgs#104061

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Lazyly reinit threads after a fork in OMP mode" #2982

Revert "Lazyly reinit threads after a fork in OMP mode" #2982

jonringer commented Nov 11, 2020 •

edited

Loading

martin-frbg commented Nov 11, 2020

Flamefire commented Nov 11, 2020

jonringer commented Nov 11, 2020

jonringer commented Nov 12, 2020

Flamefire commented Nov 12, 2020

jonringer commented Nov 13, 2020 •

edited

Loading

martin-frbg commented Nov 14, 2020

jonringer commented Nov 14, 2020

martin-frbg commented Nov 14, 2020

jonringer commented Nov 15, 2020

martin-frbg commented Nov 15, 2020

jonringer commented Nov 15, 2020 •

edited

Loading

jonringer commented Nov 15, 2020

martin-frbg commented Nov 15, 2020

jonringer commented Nov 15, 2020 •

edited

Loading

martin-frbg commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

Flamefire commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

Flamefire commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

Flamefire commented Nov 17, 2020

jonringer commented Nov 17, 2020 •

edited

Loading

jonringer commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

jonringer commented Nov 17, 2020

jonringer commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

jonringer commented Nov 17, 2020

Flamefire commented Nov 17, 2020 •

edited

Loading

jonringer commented Nov 17, 2020

Revert "Lazyly reinit threads after a fork in OMP mode" #2982

Revert "Lazyly reinit threads after a fork in OMP mode" #2982

Conversation

jonringer commented Nov 11, 2020 • edited Loading

martin-frbg commented Nov 11, 2020

Flamefire commented Nov 11, 2020

jonringer commented Nov 11, 2020

jonringer commented Nov 12, 2020

Flamefire commented Nov 12, 2020

jonringer commented Nov 13, 2020 • edited Loading

martin-frbg commented Nov 14, 2020

jonringer commented Nov 14, 2020

martin-frbg commented Nov 14, 2020

jonringer commented Nov 15, 2020

martin-frbg commented Nov 15, 2020

jonringer commented Nov 15, 2020 • edited Loading

jonringer commented Nov 15, 2020

martin-frbg commented Nov 15, 2020

jonringer commented Nov 15, 2020 • edited Loading

martin-frbg commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

Flamefire commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

Flamefire commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

Flamefire commented Nov 17, 2020

jonringer commented Nov 17, 2020 • edited Loading

jonringer commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

jonringer commented Nov 17, 2020

jonringer commented Nov 17, 2020

martin-frbg commented Nov 17, 2020

jonringer commented Nov 17, 2020

Flamefire commented Nov 17, 2020 • edited Loading

jonringer commented Nov 17, 2020

jonringer commented Nov 11, 2020 •

edited

Loading

jonringer commented Nov 13, 2020 •

edited

Loading

jonringer commented Nov 15, 2020 •

edited

Loading

jonringer commented Nov 15, 2020 •

edited

Loading

jonringer commented Nov 17, 2020 •

edited

Loading

Flamefire commented Nov 17, 2020 •

edited

Loading