[BLAS] Simplify CublasScopedContextHandler #609

konradkusiak97 · 2024-10-30T18:07:09Z

Nowadays we only use the cuda primary context in DPC++, hence we can refactor CublasScopedContextHandler.

Since now we only use the primary context which is unique to a device, we can change the
unordered_map<context, handle> to be unordered_map<device, handle>,
I believe that the call to sycl::detail::contextSetExtendedDeleter is not necessary. It provides additional feature tying the lifetime of cublasHandle to the corresponding sycl queue but it makes the code very complicated (with atomics etc.) and uses the detail namespace which is not ideal.
cublas_handle.hpp was modified such that it is not templated anymore. Both DPC++ and AdaptiveCpp versions use the same mapping so that could be simplified. Also, we need to set the correct context in the custom destructor in order to destroy cublas handles so the native type was necessary there.

Side note:
We can definitely remove the dependency on UR headers just by changing the templated types to the native types.

test_main_blas_rt.txt
test_main_blas_ct.txt

…We don't need to do any early cleanup upon sycl context destruction.

…andle_t

konradkusiak97 · 2024-11-04T10:54:13Z

@oneapi-src/onemkl-blas-write any thoughts on this?

andrewtbarker · 2024-11-04T16:35:07Z

I very much like the simplification here. If a user has two cuda devices available, how do they specify which one to use?

konradkusiak97 · 2024-11-04T17:41:14Z

I very much like the simplification here. If a user has two cuda devices available, how do they specify which one to use?

They would specify that at the creation of a sycl::queue. Each queue can be mapped only to a single device and CublasScopedContextHandler constructor takes a queue parameter. My understanding is that we want to use one cublasHandle_t per one cuda device.

konradkusiak97 · 2024-11-04T17:54:58Z

Thanks for the review @andrewtbarker! In that case I will apply similar changes to other backends as well.

Rbiessy

Sorry for the late review. I agree that the CublasScopedContextHandler can be improved but I have 2 big concerns with the suggested changes.
Ideally we should have a closer look at the impact of this change on an application using oneMKL+cuBLAS or create an example that calls a few oneMKL functions.

src/blas/backends/cublas/cublas_scope_handle.cpp

src/blas/backends/cublas/cublas_scope_handle.hpp

…xtendedDeleter It seems the static thread_local unordered map needs to stay because of all the thread shenanigans. But we're removing the use of detail namespace in sycl since it's not necessary for correctness.

konradkusiak97 · 2024-11-08T17:30:28Z

I modified this PR such that the cublasHandle(s) are destroyed only in one place: at the end of the program. Because of all the threading shenanigans I believe the static thread_local unordered_map needs to stay.

By removing the use of sycl::detail::contextSetExtendedDeleter we aren't allowing the possibility of cublasHandle being destroyed when sycl::queue is destroyed. This is useful feature but it makes this code very complicated and I'm not sure exactly how much it is actually used in practice. If it is really necessary I think we should provide a public API for it in DPC++ and not use the detail namespace.

The good news is that those changes make AdaptiveCpp and DPC++ implementations identical (if that's of any benefit).

Rbiessy · 2024-11-15T14:19:14Z

Alright, that looks fine to me then.

The good news is that those changes make AdaptiveCpp and DPC++ implementations identical (if that's of any benefit).

Do you think it would be possible to remove cublas_scope_handle_hipsycl.hpp and cublas_scope_handle_hipsycl.cpp as part of this PR then? I'm not sure if the blas domain still compile with AdaptiveCpp so it may be too difficult to test right now.

konradkusiak97 · 2024-11-15T14:39:07Z

Good point, I think those files can be removed but I don't have much experience with AdaptiveCpp so I wouldn't be able to test the build (at least right now).

Rbiessy · 2024-11-15T15:11:47Z

Looking at the internal CI, it does not seem that the blas domain compiles with AdaptiveCpp anyway. One of the issue is #567. I see that there are still some differences like ih.get_native_device<sycl::backend::ext_oneapi_cuda>() vs interop_h.get_native_device<sycl::backend::cuda>() so let's not bother trying to fix that today. I think we can merge this next week, thanks for the work!

konradkusiak97 added 5 commits October 30, 2024 17:05

Replaced type of cublas_handle map to CUcontext to remove UR dependency

08dedbf

Removed checking if current Ctx is not Primary. It's always primary Ctx

8472bae

Removed sycl::context* member and call to ContextSetExtendedDeleter. …

000ebf8

…We don't need to do any early cleanup upon sycl context destruction.

Remove unnecessary includes

7b43b95

Changed handle_helper to have unordered_map of CUdevice(s) -> cublasH…

1961e14

…andle_t

konradkusiak97 requested a review from a team as a code owner October 30, 2024 18:07

konradkusiak97 force-pushed the RemoveURandSYCLDetailDependency branch from 9c6f8e7 to 8f547cd Compare October 30, 2024 18:21

andrewtbarker approved these changes Nov 4, 2024

View reviewed changes

Rbiessy reviewed Nov 5, 2024

View reviewed changes

src/blas/backends/cublas/cublas_scope_handle.cpp Show resolved Hide resolved

src/blas/backends/cublas/cublas_scope_handle.cpp Outdated Show resolved Hide resolved

Rbiessy reviewed Nov 6, 2024

View reviewed changes

src/blas/backends/cublas/cublas_scope_handle.hpp Show resolved Hide resolved

src/blas/backends/cublas/cublas_scope_handle.hpp Show resolved Hide resolved

konradkusiak97 force-pushed the RemoveURandSYCLDetailDependency branch from b527920 to ee669b8 Compare November 7, 2024 15:43

konradkusiak97 force-pushed the RemoveURandSYCLDetailDependency branch from ee669b8 to 46a2661 Compare November 8, 2024 17:14

We need to set the context properly before destroying cublasHandles

5996320

Rbiessy approved these changes Nov 15, 2024

View reviewed changes

Rbiessy merged commit c0cef0c into uxlfoundation:develop Nov 18, 2024
7 checks passed

Rbiessy mentioned this pull request Nov 29, 2024

[SPARSE] Add support for rocSPARSE backend #544

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BLAS] Simplify CublasScopedContextHandler #609

[BLAS] Simplify CublasScopedContextHandler #609

konradkusiak97 commented Oct 30, 2024 •

edited

Loading

konradkusiak97 commented Nov 4, 2024

andrewtbarker commented Nov 4, 2024

konradkusiak97 commented Nov 4, 2024

konradkusiak97 commented Nov 4, 2024

Rbiessy left a comment

konradkusiak97 commented Nov 8, 2024 •

edited

Loading

Rbiessy commented Nov 15, 2024

konradkusiak97 commented Nov 15, 2024

Rbiessy commented Nov 15, 2024

[BLAS] Simplify CublasScopedContextHandler #609

[BLAS] Simplify CublasScopedContextHandler #609

Conversation

konradkusiak97 commented Oct 30, 2024 • edited Loading

konradkusiak97 commented Nov 4, 2024

andrewtbarker commented Nov 4, 2024

konradkusiak97 commented Nov 4, 2024

konradkusiak97 commented Nov 4, 2024

Rbiessy left a comment

Choose a reason for hiding this comment

konradkusiak97 commented Nov 8, 2024 • edited Loading

Rbiessy commented Nov 15, 2024

konradkusiak97 commented Nov 15, 2024

Rbiessy commented Nov 15, 2024

konradkusiak97 commented Oct 30, 2024 •

edited

Loading

konradkusiak97 commented Nov 8, 2024 •

edited

Loading