Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add documentation for atomic_ref #257

Merged
merged 10 commits into from
Nov 29, 2022
4 changes: 3 additions & 1 deletion docs/extended_api/synchronization_primitives.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### Atomics

| [`cuda::atomic`] | System-wide [`cuda::std::atomic`] objects and operations. `(class template)` <br/><br/> 1.0.0 / CUDA 10.2 |
| [`cuda::atomic_ref`] | System-wide [`cuda::std::atomic_ref`] objects and operations. `(class template)` <br/><br/> 1.7.0 / CUDA 11.6 |

### Latches

Expand Down Expand Up @@ -31,6 +32,7 @@ The pipeline library is included in the CUDA Toolkit, but is not part of the


[`cuda::std::atomic`]: https://en.cppreference.com/w/cpp/atomic/atomic
[`cuda::std::atomic_ref`]: https://en.cppreference.com/w/cpp/atomic/atomic_ref
[`cuda::std::barrier`]: https://en.cppreference.com/w/cpp/thread/barrier
[`cuda::std::latch`]: https://en.cppreference.com/w/cpp/thread/latch
[`cuda::std::counting_semaphore`]: https://en.cppreference.com/w/cpp/thread/counting_semaphore
Expand All @@ -40,6 +42,7 @@ The pipeline library is included in the CUDA Toolkit, but is not part of the
[`cuda::memcpy_async`]: {{ "extended_api/asynchronous_operations/memcpy_async.html" | relative_url }}

[`cuda::atomic`]: {{ "extended_api/synchronization_primitives/atomic.html" | relative_url }}
[`cuda::atomic_ref`]: {{ "extended_api/synchronization_primitives/atomic_ref.html" | relative_url }}
[`cuda::barrier`]: {{ "extended_api/synchronization_primitives/barrier.html" | relative_url }}
[`cuda::latch`]: {{ "extended_api/synchronization_primitives/latch.html" | relative_url }}
[`cuda::counting_semaphore`]: {{ "extended_api/synchronization_primitives/counting_semaphore.html" | relative_url }}
Expand All @@ -51,4 +54,3 @@ The pipeline library is included in the CUDA Toolkit, but is not part of the
[`cuda::make_pipeline`]: {{ "extended_api/synchronization_primitives/make_pipeline.html" | relative_url }}
[`cuda::pipeline_consumer_wait_prior`]: {{ "extended_api/synchronization_primitives/pipeline_consumer_wait_prior.html" | relative_url }}
[`cuda::pipeline_producer_commit`]: {{ "extended_api/synchronization_primitives/pipeline_producer_commit.html" | relative_url }}

101 changes: 101 additions & 0 deletions docs/extended_api/synchronization_primitives/atomic_ref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 1
---

# `cuda::atomic_ref`

Defined in header `<cuda/atomic>`:

```cuda
template <typename T, cuda::thread_scope Scope = cuda::thread_scope_system>
class cuda::atomic_ref;
```

The class template `cuda::atomic_ref` is an extended form of [`cuda::std::atomic_ref`]
that takes an additional [`cuda::thread_scope`] argument, defaulted to
`cuda::std::thread_scope_system`.
It has the same interface and semantics as [`cuda::std::atomic_ref`], with the
following additional operations.
This class additionally deviates from the standard by being backported to C++11.

## Limitations

`cuda::atomic_ref<T>` and `cuda::std::atomic_ref<T>` may only be instantiated with types sized between 4 and 8 bytes.
wmaxey marked this conversation as resolved.
Show resolved Hide resolved
wmaxey marked this conversation as resolved.
Show resolved Hide resolved

Constructing multiple `cuda::std::atomic_ref<T, Sco>` with the same reference, but different values for `Sco` is illegal.
wmaxey marked this conversation as resolved.
Show resolved Hide resolved

`cuda::std::atomic_ref<T>` may be used with structures, but the behavior of performing atomic operations through it on objects with padding bytes is undefined.
wmaxey marked this conversation as resolved.
Show resolved Hide resolved
wmaxey marked this conversation as resolved.
Show resolved Hide resolved

## Atomic Extrema Operations

| [`cuda::atomic_ref::fetch_min`] | Atomically find the minimum of the stored value and a provided value. `(member function)` |
| [`cuda::atomic_ref::fetch_max`] | Atomically find the maximum of the stored value and a provided value. `(member function)` |

## Concurrency Restrictions

An object of type `cuda::atomic_ref` or [`cuda::std::atomic_ref`] shall not be accessed
concurrently by CPU and GPU threads unless:
- it is in unified memory and the [`concurrentManagedAccess` property] is 1, or
- it is in CPU memory and the [`hostNativeAtomicSupported` property] is 1.
wmaxey marked this conversation as resolved.
Show resolved Hide resolved

Note, for objects of scopes other than `cuda::thread_scope_system` this is a
data-race, and thefore also prohibited regardless of memory characteristics.

Under CUDA Compute Capability 6 (Pascal), an object of type `atomic_ref` may not be
used:
- with automatic storage duration, or
- if `is_always_lock_free()` is `false`.

Under CUDA Compute Capability prior to 6 (Pascal), objects of type
`cuda::atomic_ref` or [`cuda::std::atomic_ref`] may not be used.

## Implementation-Defined Behavior

For each type `T` and [`cuda::thread_scope`] `S`, the value of
`cuda::atomic_ref<T, S>::is_always_lock_free()` is as follows:

| Type `T` | [`cuda::thread_scope`] `S` | `cuda::atomic_ref<T, S>::is_always_lock_free()` |
|----------|----------------------------|---------------------------------------------|
| Any | Any | `sizeof(T) <= 8` |

## Example

```cuda
#include <cuda/atomic>

__global__ void example_kernel(int *gmem, int *pinned_mem) {
// This atomic is suitable for all threads in the system.
cuda::atomic_ref<int, cuda::thread_scope_system> a(pinned_mem);

// This atomic has the same type as the previous one (`a`).
cuda::atomic_ref<int> b(pinned_mem);

// This atomic is suitable for all threads on the current processor (e.g. GPU).
cuda::atomic_ref<int, cuda::thread_scope_device> c(gmem);

__shared__ int shared_v;
// This atomic is suitable for threads in the same thread block.
cuda::atomic_ref<int, cuda::thread_scope_block> d(&shared);
}
```

[See it on Godbolt](https://godbolt.org/z/fr4K7ErEh){: .btn }


[`cuda::thread_scope`]: ../thread_scopes.md

[`cuda::atomic_thread_fence`]: ./atomic/atomic_thread_fence.md

[`cuda::atomic_ref::fetch_min`]: ./atomic/fetch_min.md
[`cuda::atomic_ref::fetch_max`]: ./atomic/fetch_max.md

[`cuda::std::atomic_ref`]: https://en.cppreference.com/w/cpp/atomic/atomic_ref

[atomics.types.int]: https://eel.is/c++draft/atomics.types.int
[atomics.types.pointer]: https://eel.is/c++draft/atomics.types.pointer

[`concurrentManagedAccess` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_116f9619ccc85e93bc456b8c69c80e78b
[`hostNativeAtomicSupported` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1ef82fd7d1d0413c7d6f33287e5b6306f

2 changes: 1 addition & 1 deletion docs/extended_api/synchronization_primitives/barrier.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 2
nav_order: 3
---

# `cuda::barrier`
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 4
nav_order: 5
---

# `cuda::binary_semaphore`
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 3
nav_order: 4
---

# `cuda::counting_semaphore`
Expand Down
2 changes: 1 addition & 1 deletion docs/extended_api/synchronization_primitives/latch.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 1
nav_order: 2
---

# `cuda::latch`
Expand Down