Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add documentation for atomic_ref #257

Merged
merged 10 commits into from
Nov 29, 2022
4 changes: 3 additions & 1 deletion docs/extended_api/synchronization_primitives.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### Atomics

| [`cuda::atomic`] | System-wide [`cuda::std::atomic`] objects and operations. `(class template)` <br/><br/> 1.0.0 / CUDA 10.2 |
| [`cuda::atomic_ref`] | System-wide [`cuda::std::atomic_ref`] objects and operations. `(class template)` <br/><br/> 1.7.0 / CUDA 11.6 |

### Latches

Expand Down Expand Up @@ -31,6 +32,7 @@ The pipeline library is included in the CUDA Toolkit, but is not part of the


[`cuda::std::atomic`]: https://en.cppreference.com/w/cpp/atomic/atomic
[`cuda::std::atomic_ref`]: https://en.cppreference.com/w/cpp/atomic/atomic_ref
[`cuda::std::barrier`]: https://en.cppreference.com/w/cpp/thread/barrier
[`cuda::std::latch`]: https://en.cppreference.com/w/cpp/thread/latch
[`cuda::std::counting_semaphore`]: https://en.cppreference.com/w/cpp/thread/counting_semaphore
Expand All @@ -40,6 +42,7 @@ The pipeline library is included in the CUDA Toolkit, but is not part of the
[`cuda::memcpy_async`]: {{ "extended_api/asynchronous_operations/memcpy_async.html" | relative_url }}

[`cuda::atomic`]: {{ "extended_api/synchronization_primitives/atomic.html" | relative_url }}
[`cuda::atomic_ref`]: {{ "extended_api/synchronization_primitives/atomic_ref.html" | relative_url }}
[`cuda::barrier`]: {{ "extended_api/synchronization_primitives/barrier.html" | relative_url }}
[`cuda::latch`]: {{ "extended_api/synchronization_primitives/latch.html" | relative_url }}
[`cuda::counting_semaphore`]: {{ "extended_api/synchronization_primitives/counting_semaphore.html" | relative_url }}
Expand All @@ -51,4 +54,3 @@ The pipeline library is included in the CUDA Toolkit, but is not part of the
[`cuda::make_pipeline`]: {{ "extended_api/synchronization_primitives/make_pipeline.html" | relative_url }}
[`cuda::pipeline_consumer_wait_prior`]: {{ "extended_api/synchronization_primitives/pipeline_consumer_wait_prior.html" | relative_url }}
[`cuda::pipeline_producer_commit`]: {{ "extended_api/synchronization_primitives/pipeline_producer_commit.html" | relative_url }}

103 changes: 103 additions & 0 deletions docs/extended_api/synchronization_primitives/atomic_ref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 1
---

# `cuda::atomic_ref`

Defined in header `<cuda/atomic>`:

```cuda
template <typename T, cuda::thread_scope Scope = cuda::thread_scope_system>
class cuda::atomic_ref;
```

The class template `cuda::atomic_ref` is an extended form of [`cuda::std::atomic_ref`]
that takes an additional [`cuda::thread_scope`] argument, defaulted to
`cuda::std::thread_scope_system`.
It has the same interface and semantics as [`cuda::std::atomic_ref`], with the
following additional operations.
This class additionally deviates from the standard by being backported to C++11.

## Limitations

`cuda::atomic_ref<T>` and `cuda::std::atomic_ref<T>` may only be instantiated with a T that are either 4 or 8 bytes.

No object or subobject of an object referenced by an `atomic_­ref` shall be concurrently referenced by any other `atomic_­ref` that has a different `Scope`.

For `cuda::atomic_ref<T>` and `cuda::std::atomic_ref<T>` the type `T` must satisfy the following:
- `4 <= sizeof(T) <= 8`
wmaxey marked this conversation as resolved.
Show resolved Hide resolved
- `T` must not have "padding bits", i.e., 'T`'s [object representation](https://en.cppreference.com/w/cpp/language/object#Object_representation_and_value_representation) must not have bits that do not participate in it's value representation
wmaxey marked this conversation as resolved.
Show resolved Hide resolved

## Atomic Extrema Operations

| [`cuda::atomic_ref::fetch_min`] | Atomically find the minimum of the stored value and a provided value. `(member function)` |
| [`cuda::atomic_ref::fetch_max`] | Atomically find the maximum of the stored value and a provided value. `(member function)` |

## Concurrency Restrictions

An object of type `cuda::atomic_ref` or [`cuda::std::atomic_ref`] shall not be accessed
concurrently by CPU and GPU threads unless:
- it is in unified memory and the [`concurrentManagedAccess` property] is 1, or
- it is in CPU memory and the [`hostNativeAtomicSupported` property] is 1.
wmaxey marked this conversation as resolved.
Show resolved Hide resolved

Note, for objects of scopes other than `cuda::thread_scope_system` this is a
data-race, and thefore also prohibited regardless of memory characteristics.

Under CUDA Compute Capability 6 (Pascal), an object of type `atomic_ref` may not be
used:
- with automatic storage duration, or
- if `is_always_lock_free()` is `false`.

Under CUDA Compute Capability prior to 6 (Pascal), objects of type
`cuda::atomic_ref` or [`cuda::std::atomic_ref`] may not be used.

## Implementation-Defined Behavior

For each type `T` and [`cuda::thread_scope`] `S`, the value of
`cuda::atomic_ref<T, S>::is_always_lock_free()` is as follows:

| Type `T` | [`cuda::thread_scope`] `S` | `cuda::atomic_ref<T, S>::is_always_lock_free()` |
|----------|----------------------------|---------------------------------------------|
| Any | Any | `sizeof(T) <= 8` |

## Example

```cuda
#include <cuda/atomic>

__global__ void example_kernel(int *gmem, int *pinned_mem) {
// This atomic is suitable for all threads in the system.
cuda::atomic_ref<int, cuda::thread_scope_system> a(pinned_mem);

// This atomic has the same type as the previous one (`a`).
cuda::atomic_ref<int> b(pinned_mem);

// This atomic is suitable for all threads on the current processor (e.g. GPU).
cuda::atomic_ref<int, cuda::thread_scope_device> c(gmem);

__shared__ int shared_v;
// This atomic is suitable for threads in the same thread block.
cuda::atomic_ref<int, cuda::thread_scope_block> d(&shared);
}
```

[See it on Godbolt](https://godbolt.org/z/fr4K7ErEh){: .btn }


[`cuda::thread_scope`]: ../thread_scopes.md

[`cuda::atomic_thread_fence`]: ./atomic/atomic_thread_fence.md

[`cuda::atomic_ref::fetch_min`]: ./atomic/fetch_min.md
[`cuda::atomic_ref::fetch_max`]: ./atomic/fetch_max.md

[`cuda::std::atomic_ref`]: https://en.cppreference.com/w/cpp/atomic/atomic_ref

[atomics.types.int]: https://eel.is/c++draft/atomics.types.int
[atomics.types.pointer]: https://eel.is/c++draft/atomics.types.pointer

[`concurrentManagedAccess` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_116f9619ccc85e93bc456b8c69c80e78b
[`hostNativeAtomicSupported` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1ef82fd7d1d0413c7d6f33287e5b6306f

2 changes: 1 addition & 1 deletion docs/extended_api/synchronization_primitives/barrier.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 2
nav_order: 3
---

# `cuda::barrier`
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 4
nav_order: 5
---

# `cuda::binary_semaphore`
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 3
nav_order: 4
---

# `cuda::counting_semaphore`
Expand Down
2 changes: 1 addition & 1 deletion docs/extended_api/synchronization_primitives/latch.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
grand_parent: Extended API
parent: Synchronization Primitives
nav_order: 1
nav_order: 2
---

# `cuda::latch`
Expand Down