Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pr/sparse solver2 #232

Merged
merged 14 commits into from
Jan 4, 2023
Merged

Pr/sparse solver2 #232

merged 14 commits into from
Jan 4, 2023

Conversation

bd4
Copy link
Contributor

@bd4 bd4 commented Dec 24, 2022

Adds sparse implementations for all backends to gt-solver. The SYCL backend does not support complex yet, because the underlying MKL API does not (it is planned for future).

@bd4 bd4 requested a review from germasch December 24, 2022 16:52
@bd4
Copy link
Contributor Author

bd4 commented Dec 24, 2022

Note that the cuda backend uses the older csrsm solver API. The new generic solver API, cusparseSpSV (single rhs) and cusparseSpSM (multiple rhs), were added in different 11.X releases and the old csrsm API was deprecated at some point. This is still too new to require a release with the new API.

The new API is different enough, adding it as an separate alternate cuda backend is probably the simplest way to implement it. Then we could conditionally use the new API based on CUDART_VERSION, that way when the deprecated API is removed everything will continue to work.

@bd4
Copy link
Contributor Author

bd4 commented Dec 24, 2022

Added new CUDA API backend.

@bd4 bd4 requested a review from gmerlo December 24, 2022 21:16
@bd4 bd4 force-pushed the pr/sparse-solver2 branch 3 times, most recently from 11478eb to 66a8299 Compare December 24, 2022 21:47
For CUDA, provide two implementations, one for 11.3.1+ with
new generic API, and one for older csrsm API (deprecated as
of 11.3.1).

For SYCL, complex is not yet supported by the underlying API,
so it is not enabled.
consistent with the rest of gtensor and gt-*
Old API has better performance than generic until 12, and in 12
the old API was also removed. There is a bsrsm2 API which is very
similar and basically is csr when block size=1, which could be explored
for performance comparison, but the generic API seems more likely to
exist for a long time.
@bd4
Copy link
Contributor Author

bd4 commented Dec 28, 2022

Initial testing with new cusparse solve API in CUDA 12, is it's good for performance and bad for memory usage. Which unfortunately removes it's purpose, because invert solve and even dense are almost always faster and it no longer uses less mem, at least in my tests so far.

Copy link
Contributor

@germasch germasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't claim that I've really gone through this carefully, but since it's tested and almost all new stuff, I'm pretty sure it won't break things ;)

Comment on lines +188 to +192
add_library(rocsparse INTERFACE IMPORTED)
target_link_libraries(rocsparse INTERFACE
"${ROCM_PATH}/lib/librocsparse.so")
target_include_directories(rocsparse INTERFACE "${ROCM_PATH}/include")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: I think at some point we should try to use the proper cmake support for hip/rocm, which I'd hope should actually be usable by now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I created an issue for it: #233

Comment on lines +459 to +466
# Library Wrapper Extensions

## gt-blas

Provides wrappers around commonly used blas routines. Requires cuBLAS, rocblas,
or oneMKL, depending on the GPU backend. Interface is mostly C style taking
raw pointers, for easy interoperability with Fortran, with a few higher level
gtensor specific helpers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for adding some documentation ;)

@bd4 bd4 merged commit 3e1be44 into wdmapp:main Jan 4, 2023
@bd4 bd4 deleted the pr/sparse-solver2 branch January 4, 2023 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants