Pr/sparse solver2 #232

bd4 · 2022-12-24T16:52:46Z

Adds sparse implementations for all backends to gt-solver. The SYCL backend does not support complex yet, because the underlying MKL API does not (it is planned for future).

bd4 · 2022-12-24T17:45:17Z

Note that the cuda backend uses the older csrsm solver API. The new generic solver API, cusparseSpSV (single rhs) and cusparseSpSM (multiple rhs), were added in different 11.X releases and the old csrsm API was deprecated at some point. This is still too new to require a release with the new API.

The new API is different enough, adding it as an separate alternate cuda backend is probably the simplest way to implement it. Then we could conditionally use the new API based on CUDART_VERSION, that way when the deprecated API is removed everything will continue to work.

bd4 · 2022-12-24T21:15:44Z

Added new CUDA API backend.

For CUDA, provide two implementations, one for 11.3.1+ with new generic API, and one for older csrsm API (deprecated as of 11.3.1). For SYCL, complex is not yet supported by the underlying API, so it is not enabled.

consistent with the rest of gtensor and gt-*

Old API has better performance than generic until 12, and in 12 the old API was also removed. There is a bsrsm2 API which is very similar and basically is csr when block size=1, which could be explored for performance comparison, but the generic API seems more likely to exist for a long time.

bd4 · 2022-12-28T23:28:50Z

Initial testing with new cusparse solve API in CUDA 12, is it's good for performance and bad for memory usage. Which unfortunately removes it's purpose, because invert solve and even dense are almost always faster and it no longer uses less mem, at least in my tests so far.

germasch

I can't claim that I've really gone through this carefully, but since it's tested and almost all new stuff, I'm pretty sure it won't break things ;)

germasch · 2023-01-04T14:14:59Z

CMakeLists.txt

+  add_library(rocsparse INTERFACE IMPORTED)
+  target_link_libraries(rocsparse INTERFACE
+                        "${ROCM_PATH}/lib/librocsparse.so")
+  target_include_directories(rocsparse INTERFACE "${ROCM_PATH}/include")
+


Side comment: I think at some point we should try to use the proper cmake support for hip/rocm, which I'd hope should actually be usable by now.

Agreed, I created an issue for it: #233

germasch · 2023-01-04T14:15:29Z

README.md

+# Library Wrapper Extensions
+
+## gt-blas
+
+Provides wrappers around commonly used blas routines. Requires cuBLAS, rocblas,
+or oneMKL, depending on the GPU backend. Interface is mostly C style taking
+raw pointers, for easy interoperability with Fortran, with a few higher level
+gtensor specific helpers.


👍 for adding some documentation ;)

bd4 added 2 commits December 23, 2022 11:16

readme: add brief description of extension libs

9c42e6e

sparse: add stream operator test

f983a90

bd4 requested a review from germasch December 24, 2022 16:52

bd4 added 2 commits December 24, 2022 15:10

sparse: expose value/idx data, const fixes

7a7109f

gt-blas: add missing npvt implementations

b4696f4

bd4 force-pushed the pr/sparse-solver2 branch from 129746d to 765768a Compare December 24, 2022 21:15

bd4 requested a review from gmerlo December 24, 2022 21:16

bd4 force-pushed the pr/sparse-solver2 branch 3 times, most recently from 11478eb to 66a8299 Compare December 24, 2022 21:47

bd4 added 3 commits December 24, 2022 16:00

solver: add sparse implementation

875d8fd

For CUDA, provide two implementations, one for 11.3.1+ with new generic API, and one for older csrsm API (deprecated as of 11.3.1). For SYCL, complex is not yet supported by the underlying API, so it is not enabled.

solver: use lowercase class names

884c5be

consistent with the rest of gtensor and gt-*

ci: enable solver build for cuda, hip

e6d0931

bd4 force-pushed the pr/sparse-solver2 branch from 66a8299 to e6d0931 Compare December 24, 2022 22:00

solver: add benchmark

3fa423a

bd4 force-pushed the pr/sparse-solver2 branch from 22cc04d to 3fa423a Compare December 26, 2022 15:24

bd4 added 2 commits December 26, 2022 15:04

sarray: const correct zero len specialization

648f16c

bd4 added 4 commits January 3, 2023 09:59

solver: add api for getting device mem usage

3012452

sparse: fix cuda backend sparse buf sizes

ef68413

solver: add alternate bsrsm2 backend for cuda 12+

067feeb

solver: use bsrsm2 by default for cuda 12

cff2152

germasch approved these changes Jan 4, 2023

View reviewed changes

bd4 merged commit 3e1be44 into wdmapp:main Jan 4, 2023

bd4 deleted the pr/sparse-solver2 branch January 4, 2023 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr/sparse solver2 #232

Pr/sparse solver2 #232

bd4 commented Dec 24, 2022

bd4 commented Dec 24, 2022

bd4 commented Dec 24, 2022

bd4 commented Dec 28, 2022

germasch left a comment

germasch Jan 4, 2023

bd4 Jan 4, 2023

germasch Jan 4, 2023

Pr/sparse solver2 #232

Pr/sparse solver2 #232

Conversation

bd4 commented Dec 24, 2022

bd4 commented Dec 24, 2022

bd4 commented Dec 24, 2022

bd4 commented Dec 28, 2022

germasch left a comment

Choose a reason for hiding this comment

germasch Jan 4, 2023

Choose a reason for hiding this comment

bd4 Jan 4, 2023

Choose a reason for hiding this comment

germasch Jan 4, 2023

Choose a reason for hiding this comment