Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARSE] Add support for rocSPARSE backend #544

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

Rbiessy
Copy link
Contributor

@Rbiessy Rbiessy commented Jul 24, 2024

Description

Add support for the rocSPARSE backend.

Depends on #527 and #532.

Rendered docs: docs.zip

Checklist

All Submissions

@Rbiessy Rbiessy requested a review from a team July 24, 2024 13:57
@Rbiessy Rbiessy changed the title [SPARSE] Add support for rocsparse backend [SPARSE] Add support for rocSPARSE backend Jul 24, 2024
@Rbiessy
Copy link
Contributor Author

Rbiessy commented Jul 30, 2024

Tests log on W6800: amd_w6800_log.txt

@gajanan-choudhary gajanan-choudhary added backend A request to enable new implementation behind API Sparse BLAS domain Sparse BLAS domain issue/request feature A request to add a new feature labels Sep 12, 2024
@gajanan-choudhary
Copy link
Contributor

gajanan-choudhary commented Oct 29, 2024

@Rbiessy, now that #527 is merged in, could you please rebase this branch against main, resolve conflicts, and force-push when you can so that I can start reviewing this PR?

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Oct 29, 2024

Yes, this is in progress! I expect I will need a few days at least. I'm aiming to merge the PR by the end of the year.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 1, 2024

@gajanan-choudhary I have updated the PR with recent changes from cuSPARSE. Note that it moved almost all the content from cusparse_task.hpp to common_launch_task.hpp with almost no changes.
New test logs:
log_mi210.txt
log_a100.txt
log_pvc.txt

@Rbiessy Rbiessy mentioned this pull request Nov 5, 2024
2 tasks
Comment on lines +90 to +93
- The same sparse matrix handle cannot be reused for multiple operations
``spmm``, ``spmv``, or ``spsv``. Doing so will throw a
``oneapi::mkl::unimplemented`` exception. See `#332
<https://github.com/ROCm/rocSPARSE/issues/332>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is quite severe, but seems to be a legitimate issue on rocSPARSE side right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If and when they fix this issue, though, will it be easy for us to make changes (with a version check of course) that correctly performs the operations rather than throwing an unimplemented exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's easy to fix on oneMath side. The issue is also referenced in this comment: https://github.com/oneapi-src/oneMKL/pull/544/files#diff-3b8c1c2c71abd54f8f90f43415c2f17b2a7fdb81c2b882c210f3cba56b4679adR63
One would just need to remove the used member, its 2 usages below as well as the mark_used method.

Comment on lines +90 to +91
if (this->format == detail::sparse_format::COO &&
!this->has_matrix_property(matrix_property::sorted)) {
Copy link
Contributor

@gajanan-choudhary gajanan-choudhary Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation, docs/domains/sparse_linear_algebra.rst, you've written:

  • The CSR format requires the column indices to be sorted within each row.

So shouldn't we be handling both CSR and COO formats here instead of just COO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I added this check in b6518c9
I've also made sure we also have tests on AMD and Nvidia which are skipped when a property is not set. The runtime example also need to set this property now.

handle_helper.rocsparse_handle_container_mapper_.insert(
std::make_pair(piPlacedContext_, atomic_ptr));

sycl::detail::pi::contextSetExtendedDeleter(*placedContext_, ContextCallback, atomic_ptr);
Copy link
Contributor

@gajanan-choudhary gajanan-choudhary Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it unchanged between the removed PI versus new UR APIs? Isn't there a sycl::detail::ur::contextSetExtendedDeleter somewhere that you need to place under #ifdef ONEAPI_ONEMKL_PI_INTERFACE_REMOVED #else #endif... ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contextSetExtendedDeleter is still in the pi namespace see https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/ur.hpp#L100. As the comment says there are plans to deprecate it. The oneAPI core team at Codeplay started to change the scopeContextHandle mechanism for BLAS in #609. They will make sure that oneMKL Interface will keep working if/when contextSetExtendedDeleter is deprecated.
If we ever need to change the scopeContextHandle mechanism, I would rather that we do it for both cuSPARSE and rocSPARSE backends in a separate PR.

auto event = queue.submit([&](sycl::handler& cgh) {
auto acc = val.template get_access<sycl::access::mode::read_write>(cgh);
detail::submit_host_task(cgh, queue, [=](sycl::interop_handle ih) {
if (dvhandle->size != size) {
Copy link
Contributor

@gajanan-choudhary gajanan-choudhary Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should throw an exception if dvhandle->size < size instead of !=. E.g., users may want to pad thesycl::buffer for some reason with zeros in the end, which wouldn't work with the current if condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, this is set_dense_vector_data used for replacing/switching out arrays in an existing handle. The condition is fine then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although maybe it doesn't hurt to add the check both here and in init_xxx_data

Copy link
Contributor Author

@Rbiessy Rbiessy Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check to verify that the buffer size is compatible with the handle size is done elsewhere, see https://github.com/oneapi-src/oneMKL/blob/develop/src/sparse_blas/generic_container.hpp#L164
This is a common place that the other backends also use. That check will run for init_*_data functions.

template <typename fpType>
void init_dense_vector(sycl::queue& queue, dense_vector_handle_t* p_dvhandle, std::int64_t size,
sycl::buffer<fpType, 1> val) {
auto event = queue.submit([&](sycl::handler& cgh) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a check here for sycl::buffer case that checks buffer->size() >= size and throws an exception otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already done, see my comment above: #544 (comment)

CHECK_DESCR_MATCH(spmv_descr, alg, "spmv_optimize");

A_handle->mark_used();
auto& buffer_size = spmv_descr->temp_buffer_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this variable specifically is kept and captured by reference outside compute_functor while other members of spmv_descr are created inside the functor? If it is because buffer_size is used later in an if condition, I'd prefer it if we change this approach and capture everything by [=] in compute_functor, even though it would mean replacing the last buffer_size > 0 with spmv_descr->temp_buffer_size > 0. I know that what you have right now is expected to work in this particular case (because of the condition that spmv_descr must live as long as spmv is being called), it is normally a bad idea to capture variables by reference in SYCL functors that are going to be running asynchronously unless you have an immediate event.wait() (e.g., like what we are doing in spmv_buffer_size() function).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's fine, done in c53ef8d

Copy link
Contributor

@gajanan-choudhary gajanan-choudhary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than some minor change requests. This PR was a lot easier to review having reviewed #527. Thanks for the fantastic work!

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Dec 2, 2024

Thanks for the review Gajanan!

For completeness, logs on MI210: log_mi210.txt
and A100: log_a100.txt

FYI we will get a second approval and I aim to merge this after the oneMKL Interface renaming PR.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Dec 4, 2024

Our AMD HW is unavailable for a while unfortunately. I will work on fixing the conflicts as soon as I can test the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend A request to enable new implementation behind API feature A request to add a new feature Sparse BLAS domain Sparse BLAS domain issue/request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants