-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pr/sparse solver2 #232
Pr/sparse solver2 #232
Conversation
Note that the cuda backend uses the older csrsm solver API. The new generic solver API, The new API is different enough, adding it as an separate alternate cuda backend is probably the simplest way to implement it. Then we could conditionally use the new API based on CUDART_VERSION, that way when the deprecated API is removed everything will continue to work. |
129746d
to
765768a
Compare
Added new CUDA API backend. |
11478eb
to
66a8299
Compare
For CUDA, provide two implementations, one for 11.3.1+ with new generic API, and one for older csrsm API (deprecated as of 11.3.1). For SYCL, complex is not yet supported by the underlying API, so it is not enabled.
consistent with the rest of gtensor and gt-*
66a8299
to
e6d0931
Compare
22cc04d
to
3fa423a
Compare
Old API has better performance than generic until 12, and in 12 the old API was also removed. There is a bsrsm2 API which is very similar and basically is csr when block size=1, which could be explored for performance comparison, but the generic API seems more likely to exist for a long time.
Initial testing with new cusparse solve API in CUDA 12, is it's good for performance and bad for memory usage. Which unfortunately removes it's purpose, because invert solve and even dense are almost always faster and it no longer uses less mem, at least in my tests so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't claim that I've really gone through this carefully, but since it's tested and almost all new stuff, I'm pretty sure it won't break things ;)
add_library(rocsparse INTERFACE IMPORTED) | ||
target_link_libraries(rocsparse INTERFACE | ||
"${ROCM_PATH}/lib/librocsparse.so") | ||
target_include_directories(rocsparse INTERFACE "${ROCM_PATH}/include") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side comment: I think at some point we should try to use the proper cmake support for hip/rocm, which I'd hope should actually be usable by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I created an issue for it: #233
# Library Wrapper Extensions | ||
|
||
## gt-blas | ||
|
||
Provides wrappers around commonly used blas routines. Requires cuBLAS, rocblas, | ||
or oneMKL, depending on the GPU backend. Interface is mostly C style taking | ||
raw pointers, for easy interoperability with Fortran, with a few higher level | ||
gtensor specific helpers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for adding some documentation ;)
Adds sparse implementations for all backends to gt-solver. The SYCL backend does not support complex yet, because the underlying MKL API does not (it is planned for future).