Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable user-provided lock table for atomic_ref<T> #990

Open
jrhemstad opened this issue Nov 18, 2021 · 2 comments
Open

Enable user-provided lock table for atomic_ref<T> #990

jrhemstad opened this issue Nov 18, 2021 · 2 comments
Labels
libcu++ For all items related to libcu++

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented Nov 18, 2021

I would like to be able to use atomic_ref<T> for sizeof(T) > 8B, i.e., atomic_ref<T>::is_always_lock_free == false.

Obviously this can't rely on built-in atomic operations.Typical implementations will make use of a lock table to support this usage where you use the address of the referenced object as a lookup into a table of mutexes.

Generic support for a lock table underneath a cuda::std::atomic_ref would be extremely non-trivial to support on all platforms and for it to work heterogeneously.

However, a less generic solution that would still be very useful would be to allow a user to provide their own lock table.

@ogiroux's idea is to partially specialize cuda::atomic_ref for is_always_lock_free == false to contain a pointer to a lock table to be supplied by the user via the atomic_ref constructor at each construction. It would likely be useful to supply an opaque cuda::atomic_lock_table<N, Scope> type that a user could instantiate and manage however they like.

A rough sketch of what this could look like:

__managed__ cuda::atomic_lock_table<1024, thread_scope_device> table;

__global__ void kernel(int4 * i){
   cuda::atomic_ref<int4, thread_scope_device> ref{i[0], table};
   ref.atomic_exchange( int4{1, 2, 3, 4} ); // internally locks a mutex in `table`
}

This would depend on #949 for implementing the atomic_lock_table.

@jrhemstad
Copy link
Collaborator Author

For reference, this functionality would be extremely useful in cuCollections where we want to use atomic_ref but we need support Key/Value types larger than 8B.

@jrhemstad
Copy link
Collaborator Author

jrhemstad commented Feb 3, 2022

@ogiroux

I've been thinking about what the lock_table object would look like.

I was thinking something along the lines of this:

// Opaque type for lock table to pass to atomic_ref
template <size_t N, thread_scope Scope, typename AccessProperty>
struct lock_table{
   static void init(lock_table* t){...}
private:
   cuda::std::array<cuda::mutex<Scope>, N> arr_;

   // Need to let atomic_ref access the storage
   template <typename T, thread_scope Scope>
   friend class atomic_ref; 
};

I want to put the access property in there so I can expose control to how the accesses to the mutexes are cached. My guess is that it'll be pretty important to perf to ensure the locks aren't getting thrashed out of L2.

That said, I'm not sure if this will work with the current limitations of the compiler unless mutex exposed an access_property as well.

@jrhemstad jrhemstad added thrust For all items related to Thrust. libcu++ For all items related to libcu++ and removed thrust For all items related to Thrust. labels Feb 22, 2023
@miscco miscco self-assigned this Feb 23, 2023
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 8, 2023
@jarmak-nv jarmak-nv transferred this issue from NVIDIA/libcudacxx Nov 8, 2023
@miscco miscco removed their assignment Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcu++ For all items related to libcu++
Projects
Status: Todo
Development

No branches or pull requests

2 participants