-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow KokkosNode = OpenMP tests #103
Comments
@ikalash I updated your comment from export OMP_NUM_THREADS=1 somewhere in your test scripts for starters. That is assuming no other part of the test system is setting thread counts. |
Oh, yes, that was an unfortunate typo - it should be "now". Unfortunately, setting export OMP_NUM_THREADS=1 does not fix the issue. |
@ikalash I opened an issue kokkos/kokkos#630 a while ago on Kokkos, cause I was experiencing the same thing. In my case the problem was that I installed kokkos with hwloc support, which basically makes kokkos ignore the OMP_NUM_THREADS env variable, and makes kokkos use hwloc to determine the number of threads, which hwloc puts equal to the number of PUs. The only way to overwrite this is to pass --kokkos-threads=N to the command line, but Albany does not allow this in the testsuite (although it does allow this when launching albany manually). Unfortunately, there seem to be some complications in making Kokkos honor the OMP variables when hwloc is installed. From what I understand, the solution is to use hwloc only to investigate the hardware topology the first time, then never use it, and use just OMP env variables. Are you using an installation of trilinos which uses hwloc? |
@bartgol . I did have hwloc, and you are right - turning it off fixes the issue. I actually remembered having observed this earlier with @jewatkins after seeing your message. Regarding passing --kokkos-threads=N to the code: I think this is actually possible to do in ctest. We do something similar for the CUDA runs - see the following code in Albany/CMakeLists.txt: SET(KOKKOS_NDEVICES "--kokkos-ndevices=${NUM_GPUS_PER_NODE}") It will generate a ctest line as follows: add_test(Helmholtz2D_Tpetra "/home/projects/pwr8-rhel73-lsf/openmpi/1.10.4/gcc/5.4.0/cuda/8.0.44/bin/mpirun" "-n" "4" "/home/projects/albany/build/AlbBuild/src/AlbanyT" "--kokkos-ndevices=4" "inputT.xml") I think one could do something similar to pass --kokkos-threads=N to each ctest test when the code is compiled with OpenMP. Closing the issue since turning off HWLOC resolved it. |
Yeah, having a cmake config variable for the number of threads is useful. We could add one such option without default (so that kokkos does its magic under the hood if one does not want to bother setting it), then add it to all the variables that define executables,. e.g., turning
Perhaps it could also be useful if we could specify the number of threads directly when invoking ctest, like "ctest --kokkos-threads=4". But I guess the first option (cmake variable) would already be an improvement. |
Maybe create a new |
Uhm, I would prefer to not create AlbanyKokkos, since we then have to create it for all the executables (for coherence). Albany already has quite a few executables, and multiplying everything by 2 may make things less clear and/or easy to maintain. I think setting the number of threads equal to 1 by default may end up not checking that everything is thread safe. On the other hand, Albany rarely (?) uses Kokkos directly, and delegates parallelization to trilinos interfaces (intrepid, tpetra...). So perhaps it could be ok to just set it equal to 1 as default, letting the user change it through cmake. But I would argue that the nightly builds should always use --kokkos-threads>=2. |
Another issue, stemming form my OS upgrade: KokkosNode = OpenMP tests are now incredibly slow, b/c of the following oversubscription of CPU cores:
Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores.
Detected: 40 cores per node.
Detected: 4 MPI_ranks per node.
Requested: 20 threads per process.
I think there must be some configuration or runtime option to fix this. @jewatkins , I think you may have experience this in the past and found a fix?
The text was updated successfully, but these errors were encountered: