Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline static members in Kokkos 4.0 class not persistent with CUDA backend #55

Open
kaschau opened this issue Apr 20, 2023 · 10 comments
Open

Comments

@kaschau
Copy link

kaschau commented Apr 20, 2023

Kokkos 4.0 changed many class members set with Kokkos::initialize() to inline static T types. With this change it seems there is an issue with pybind11 and setting these members persistently when called from python.

Whenever using cuda, the TileSizeProperties attribute maxThreads is being set to zeros, and causes an abort at the first MDRange execution.

When Kokkos::initialize() is called (from python bound function), cudaProp.maxThreadsPerMultiProcessor (from here ) reports 1024, however, by the time we get to the MDRange policy here, the space.impl_internal_space_instance()->m_maxThreadsPerSM is 0. This causes an abort at this check here.

I am only having an issue with CUDA, and it works fine with OpenMP and Serial backends. It has been consistent with every host/device compiler I have tried.

Primarily gcc 9.4.0/intel19.04 + CUDA 11.7

@kaschau
Copy link
Author

kaschau commented Apr 21, 2023

To reproduce I would expect any CUDA kernel to fail when Kokkos::Initialize() is called from pykokkos-base, and a subsequent kokkos kernel is called. I cannot reproduce in Kokkos/C++ only code.

@kaschau
Copy link
Author

kaschau commented May 13, 2023

It seems like the inline static member behavior is different when Kokkos is compiled as a static versus a shared library. Because pybind11 requires PIC, generally one just compiles Kokkos as a shared library, so there are no problems when compiling pykokkos-base. However, this leads to the behavior described above (with 4.0).

However, when I compile Kokkos as static libraries, with -fPIC, I am able to get Kokkos 4.0 to run on the cuda backend.

This is well over my compiling/C++ object lifetime/ instruction unit pay grade, so not sure what to make of it. But at least it works.

@crtrott
Copy link
Member

crtrott commented May 13, 2023

hm interesting. @nliber do you have any idea what this could be? I think it is potentially the jitting of stuff where we would have inline static things inside header files? So if something gets recompiled and then relinked it might cause issues?

I wonder if this is fixable by having all inline-static variables actually be static variables inside functions which are compiled inside the Kokkos library itself. I.e. for every static int foo; make it actually static int& foo(); and have int& foo() { static int val; return val; } somewhere?

@crtrott
Copy link
Member

crtrott commented May 13, 2023

@kaschau do you feel you could take this experiment on, i.e. make a branch of Kokkos Core go through all these variables and see if we can get this fixed that way?

@kaschau
Copy link
Author

kaschau commented May 13, 2023

@crtrott I'm a c++ ignoramos but I think I can give it a shot. I think just being able to prove one variable (the tile size for example) survives this way should be doable for me, as a proof of concept.

@jrmadsen
Copy link
Contributor

@kaschau A bit of a shot in the dark but try setting this variable to OFF and rebuild pykokkos-base:

set(CMAKE_VISIBILITY_INLINES_HIDDEN ON CACHE BOOL "Add compile flag to hide symbols of inline functions")

I suspect the reason you see this issue with shared libraries is there is some symbol that exists in both the pykokkos-base library and the Kokkos library and pykokkos-base is initializing it's copy of the symbol instead of the one that exists in the Kokkos library. And when a static Kokkos library is used, these symbols get merged.

@jrmadsen
Copy link
Contributor

@kaschau do you feel you could take this experiment on, i.e. make a branch of Kokkos Core go through all these variables and see if we can get this fixed that way?

A potential starting place might be to use the nm command line tool and see which Kokkos variables are defined in the text section of the pykokkos-base library. man nm will explain the codes for whether a symbol is undefined (i.e. defined in another library), a symbol defined in the text section, etc. Filter out any pybind symbols and see if there are any symbols defined in both the Kokkos shared library and pykokkos-base library that look suspicious.

@kaschau
Copy link
Author

kaschau commented May 16, 2023

@kaschau A bit of a shot in the dark but try setting this variable to OFF and rebuild pykokkos-base:

set(CMAKE_VISIBILITY_INLINES_HIDDEN ON CACHE BOOL "Add compile flag to hide symbols of inline functions")

I suspect the reason you see this issue with shared libraries is there is some symbol that exists in both the pykokkos-base library and the Kokkos library and pykokkos-base is initializing it's copy of the symbol instead of the one that exists in the Kokkos library. And when a static Kokkos library is used, these symbols get merged.

@jrmadsen Tried this, still had the same issue. I will take a look at nm when I have some time. Thanks!

@Yaraslaut
Copy link

Commit that broke pybind11 : kokkos/kokkos@1f048cf
And some info from valgrind (not very helpful)

==1994128== Invalid read of size 32
==1994128==    at 0x4FB9B89: __wcsncpy_avx2 (strncpy-avx2.S:306)
==1994128==    by 0x4B59439: UnknownInlinedFun (wchar2.h:146)
==1994128==    by 0x4B59439: _Py_wrealpath (fileutils.c:1996)
==1994128==    by 0x4B54A0C: _PyPathConfig_ComputeSysPath0.constprop.0 (pathconfig.c:495)
==1994128==    by 0x4B544F4: UnknownInlinedFun (main.c:575)
==1994128==    by 0x4B544F4: Py_RunMain (main.c:680)
==1994128==    by 0x4B1CF6A: Py_BytesMain (main.c:734)
==1994128==    by 0x4E7F84F: (below main) (libc_start_call_main.h:58)
==1994128==  Address 0x5ccb2a0 is 16 bytes after a block of size 176 in arena "client"

And python itself

ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims
Backtrace:
                                                               Kokkos::Impl::save_stacktrace() [0x7efc8e28d915]
Kokkos::Impl::traceback_callstack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&) [0x7efc8e280cf1]
                                                         Kokkos::Impl::host_abort(char const*) [0x7efc8e280d98]
                                                                                               [0x7efc8e4f3696]
                                                                                               [0x7efc8e4f671c]
                                                                                               [0x7efc8e4f3e43]
                                                                                               [0x7efc8e4f2679]
                                                                                               [0x7efc8e4f1ebe]
                                                                                               [0x7efc8e4f1dba]
                                                                                               [0x7efc8e4f1cde]
                                                                                               [0x7efc8e4d9030]
                                                                                               [0x7efcafa04a81]
                                                                          _PyObject_MakeTpCall [0x7efcaf9e53e4]
                                                                                               [0x7efcafa360fe]
                                                                                               [0x7efcafa1d100]
                                                                                               [0x7efcaf9e575a]
                                                                                               [0x7efc8e4d3cdb]
                                                                          _PyObject_MakeTpCall [0x7efcaf9e53e4]
                                                                      _PyEval_EvalFrameDefault [0x7efcaf9efbcb]
                                                                                               [0x7efcafaa9f6a]
                                                                               PyEval_EvalCode [0x7efcafaa997c]
                                                                                               [0x7efcafac86b3]
                                                                                               [0x7efcafac43ba]
                                                                                               [0x7efcafadadd3]
                                                                       _PyRun_SimpleFileObject [0x7efcafad9ef4]
                                                                          _PyRun_AnyFileObject [0x7efcafad8de8]
                                                                                    Py_RunMain [0x7efcafad3722]
                                                                                  Py_BytesMain [0x7efcafa9bf6b]
                                                                                               [0x7efcaf639850]
                                                                             __libc_start_main [0x7efcaf63990a]
                                                                                        _start [0x55e4512bb045]

@Yaraslaut
Copy link

I was trying to figure out what is going on in my case, and something very odd is happening since if i look at the addresses of this variable in here and here they are different.
Good news is that if I fetch kokkos and pybind directly from pykokkos-base with using CPM

FetchContent_Declare(
  PyKokkosbase
  GIT_REPOSITORY https://github.com/kokkos/pykokkos-base.git
  GIT_TAG        94553b7e4be91b042baa9d903dc98e73722eeced
)
FetchContent_MakeAvailable(PyKokkosbase)
find_package(Python3 COMPONENTS Development)

..... 
pybind11_add_module(...)
target_link_libraries( ... Kokkos::kokkos)
.....

Everything starts to work properly
by default kokkos 3.7 is used inside pykokkos-base , to check with kokkos 4.0 you can update submodule index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants