This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 187
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
29 changed files
with
986 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
--- | ||
has_children: true | ||
has_toc: true | ||
nav_order: 4 | ||
nav_order: 5 | ||
--- | ||
|
||
# Contributing | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
has_children: true | ||
has_toc: false | ||
nav_order: 3 | ||
--- | ||
|
||
# Extended API | ||
|
||
## [Headers](./extended_api/headers.md) | ||
|
||
### [\<cuda/pipeline>](./extended_api/headers/pipeline.md) | ||
|
||
## [Concepts](./extended_api/concepts.md) | ||
|
||
### [Group](./extended_api/concepts/group.md) | ||
|
||
### [Shape](./extended_api/concepts/shape.md) | ||
|
||
## [Synchronization library](./extended_api/synchronization_library.md) | ||
|
||
### [pipeline](./extended_api/synchronization_library/pipeline.md) | ||
|
||
### [pipeline_shared_state](./extended_api/synchronization_library/pipeline_shared_state.md) | ||
|
||
## [Asynchronous operations library](./extended_api/asynchronous_operations_library.md) | ||
|
||
### [aligned_size_t](./extended_api/asynchronous_operations_library/aligned_size_t.md) | ||
|
||
### [memcpy_async](./extended_api/asynchronous_operations_library/memcpy_async.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
parent: Extended API | ||
has_children: true | ||
has_toc: false | ||
--- | ||
|
||
# Asynchronous operations library | ||
|
||
The asynchronous operations library provides components for asynchronous data movement. | ||
|
||
## Shapes | ||
|
||
| [aligned_size_t](./asynchronous_operations_library/aligned_size_t.md) | defines an extent of bytes with a statically defined alignment `(class template)` | | ||
|
||
## Asynchronous operations | ||
|
||
| [memcpy_async](./asynchronous_operations_library/memcpy_async.md) | asynchronously copies one buffer to another `(function template)` | |
61 changes: 61 additions & 0 deletions
61
docs/extended_api/asynchronous_operations_library/aligned_size_t.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
--- | ||
grand_parent: Extended API | ||
parent: Asynchronous operations library | ||
--- | ||
|
||
# cuda::**aligned_size_t** | ||
|
||
Defined in header [`<cuda/barrier>`](../headers/barrier.md) | ||
|
||
Defined in header [`<cuda/pipeline>`](../headers/pipeline.md) | ||
|
||
```c++ | ||
template<size_t Alignment> | ||
struct aligned_size_t; | ||
``` | ||
|
||
The class template `cuda::aligned_size_t` is a _shape_ representing an extent of bytes with a statically defined (address and size) alignment. | ||
|
||
## Template parameters | ||
|
||
| Alignment | the address & size alignement of the byte extent | | ||
|
||
## Data members | ||
|
||
| [align](./aligned_size_t/align.md) | the alignment of the byte extent | | ||
| [value](./aligned_size_t/value.md) | the size of the byte extent | | ||
|
||
## Member functions | ||
|
||
| [(constructor)](./aligned_size_t/constructor.md) | constructs an _aligned size_ | | ||
| (destructor) [implicitly declared] | trivial implicit destructor | | ||
| operator= [implicitly declared] | trivial implicit copy/move assignment | | ||
| operator size_t | implicit conversion to [`size_t`](https://en.cppreference.com/w/cpp/types/size_t) | | ||
|
||
## Notes | ||
|
||
If `value` is not a multiple of `align` the behavior is undefined. | ||
|
||
If `Alignment` is not a [valid alignment](https://en.cppreference.com/w/c/language/object#Alignment) the behavior is undefined. | ||
|
||
## Example | ||
|
||
```c++ | ||
#include <cuda/barrier> | ||
|
||
__global__ void example_kernel(void * dst, void * src, size_t size) | ||
{ | ||
cuda::barrier<cuda::thread_scope_system> barrier; | ||
init(&barrier, 1); | ||
|
||
// Implementation cannot make assumptions about alignment | ||
cuda::memcpy_async(dst, src, size, barrier); | ||
|
||
// Implementation can assume that dst, src and size are 16-bytes aligned and may optimize accordingly | ||
cuda::memcpy_async(dst, src, cuda::aligned_size_t<16>(size), barrier); | ||
|
||
barrier.arrive_and_wait(); | ||
} | ||
``` | ||
[See it on Godbolt](https://godbolt.org/z/v7Ev9E){: .btn } |
11 changes: 11 additions & 0 deletions
11
docs/extended_api/asynchronous_operations_library/aligned_size_t/align.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
nav_exclude: true | ||
--- | ||
|
||
# cuda::aligned_size_t\<Alignment>::**align** | ||
|
||
```c++ | ||
static constexpr size_t align = Alignment; | ||
``` | ||
|
||
Represents the alignment (address and size) of the byte extent. |
15 changes: 15 additions & 0 deletions
15
docs/extended_api/asynchronous_operations_library/aligned_size_t/constructor.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
nav_exclude: true | ||
--- | ||
|
||
# cuda::aligned_size_t\<Alignment>::**aligned_size_t** | ||
|
||
```c++ | ||
explicit aligned_size_t(size_t size); | ||
``` | ||
Constructs an `aligned_size_t` _shape_. | ||
## Notes | ||
If `size` is not a multiple of `Alignment` the behavior is undefined. |
11 changes: 11 additions & 0 deletions
11
docs/extended_api/asynchronous_operations_library/aligned_size_t/value.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
nav_exclude: true | ||
--- | ||
|
||
# cuda::aligned_size_t\<Alignment>::**value** | ||
|
||
```c++ | ||
size_t value; | ||
``` | ||
|
||
Represents the size of the byte extent. |
64 changes: 64 additions & 0 deletions
64
docs/extended_api/asynchronous_operations_library/memcpy_async.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
--- | ||
grand_parent: Extended API | ||
parent: Asynchronous operations library | ||
--- | ||
|
||
# cuda::**memcpy_async** | ||
|
||
Defined in header [`<cuda/barrier>`](../../api/synchronization_library/barrier.md) | ||
|
||
```c++ | ||
template<typename Shape, thread_scope Scope> | ||
void memcpy_async(void * destination, void const * source, Shape size, barrier<Scope> & barrier); // (1) | ||
|
||
template<typename Group, typename Shape, thread_scope Scope> | ||
void memcpy_async(Group const & group, void * destination, void const * source, Shape size, barrier<Scope> & barrier); // (2) | ||
``` | ||
Defined in header [`<cuda/pipeline>`](../headers/pipeline.md) | ||
```c++ | ||
template<typename Shape, thread_scope Scope> | ||
void memcpy_async(void * destination, void const * source, Shape size, pipeline<Scope> & pipeline); // (3) | ||
template<typename Group, typename Shape, thread_scope Scope> | ||
void memcpy_async(Group const & group, void * destination, void const * source, Shape size, pipeline<Scope> & pipeline); // (4) | ||
``` | ||
|
||
Asynchronously copies `size` bytes from the memory location pointed to by `source` to the memory location pointed to by `destination`. | ||
Both objects are reinterpreted as arrays of `unsigned char`. | ||
|
||
`cuda::memcpy_async` have similar constraints to [`std::memcpy`](https://en.cppreference.com/w/cpp/string/byte/memcpy), namely: | ||
* If the objects overlap, the behavior is undefined. | ||
* If either `destination` or `source` is an invalid or null pointer, the behavior is undefined (even if `count` is zero). | ||
* If the objects are potentially-overlapping or not [`TriviallyCopyable`](https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable), | ||
the behavior is undefined. | ||
|
||
1. Binds the asynchronous copy completion to `barrier` and issues the copy in the current thread. | ||
2. Binds the asynchronous copy completion to `barrier` and cooperatively issues the copy across all threads in `group`. | ||
3. Binds the asynchronous copy completion to `pipeline` and issues the copy in the current thread | ||
4. Binds the asynchronous copy completion to `pipeline` and cooperatively issues the copy across all threads in `group`. | ||
|
||
## Template parameters | ||
|
||
| Group | a type satisfying the [_group concept_](../concepts/group.md) | | ||
| Shape | a type satisfying the [_shape concept_](../concepts/shape.md) (see [`size_t`](https://en.cppreference.com/w/c/types/size_t) and [`cuda::aligned_size_t`](./aligned_size_t.md)) | | ||
|
||
## Parameters | ||
|
||
| group | the group of threads | | ||
| destination | pointer to the memory location to copy to | | ||
| source | pointer to the memory location to copy from | | ||
| size | the number of bytes to copy | | ||
| barrier | the barrier object used to wait on the copy completion | | ||
| pipeline | the pipeline object used to wait on the copy completion | | ||
|
||
## Notes | ||
|
||
The implementation may perform the copy synchronously in some implementation defined cases. | ||
|
||
## Example | ||
|
||
```c++ | ||
TODO | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
parent: Extended API | ||
has_children: true | ||
has_toc: false | ||
nav_order: 1 | ||
--- | ||
|
||
# Concepts | ||
|
||
| [Group](./concepts/group.md) | defines the requirements of a type that represents a group of cooperating threads | | ||
| [Shape](./concepts/shape.md) | defines the requirements of a type that represents a byte extent with a particular memory layout | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
grand_parent: Extended API | ||
parent: Concepts | ||
--- | ||
|
||
# Group | ||
|
||
```c++ | ||
struct Group { | ||
static constexpr cuda::thread_scope thread_scope = /*implementation-defined*/; | ||
integral size() const; | ||
integral thread_rank() const; | ||
void sync() const; | ||
}; | ||
``` | ||
The _Group concept_ defines the requirements of a type that represents a group of cooperating threads. | ||
## Data members | ||
| thread_scope | the scope at which `Group::sync()` synchronizes memory operations | | ||
## Member functions | ||
| size | returns the number of participating threads | | ||
| thread_rank | returns a unique value for each participating thread (`0 <= Group::thread_rank() < Group::size()`) | | ||
| sync | synchronizes the participating threads | | ||
## Notes | ||
This concept is defined for documentation purposes but is not materialized in the library. | ||
## Example | ||
```c++ | ||
#include <cuda/atomic> | ||
struct single_thread_group { | ||
static constexpr cuda::thread_scope thread_scope = cuda::thread_scope::thread_scope_thread; | ||
size_t size() const { return 1; } | ||
size_t thread_rank() const { return 1; } | ||
void sync() const { } | ||
}; | ||
``` | ||
|
||
[See it on Godbolt](https://godbolt.org/z/453r3s){: .btn } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
grand_parent: Extended API | ||
parent: Concepts | ||
--- | ||
|
||
# Shape | ||
|
||
```c++ | ||
struct Shape { | ||
operator size_t const; | ||
}; | ||
``` | ||
The _Shape concept_ defines the requirements of a type that represents a byte extent with a particular memory layout. | ||
## Member functions | ||
| operator size_t | implicit conversion to [`size_t`](https://en.cppreference.com/w/cpp/types/size_t) | | ||
## Notes | ||
This concept is defined for documentation purposes but is not materialized in the library. | ||
## Example | ||
```c++ | ||
// A size that carries an alignment hint | ||
template <size_t Align> | ||
struct aligned_size { | ||
static constexpr size_t align = Align; | ||
size_t size; | ||
aligned_size(size_t s) : size(s) {} | ||
operator size_t() const { return size; } | ||
}; | ||
``` | ||
|
||
[See it on Godbolt](https://godbolt.org/z/hbajKo){: .btn } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
parent: Extended API | ||
has_children: true | ||
has_toc: false | ||
nav_order: 0 | ||
--- | ||
|
||
# Headers | ||
|
||
## Synchronization library | ||
|
||
| [\<pipeline\>](./headers/pipeline.md) | [Pipelines](./synchronization_library/pipeline.md) and corresponding [memcpy_async](./asynchronous_operations_library/memcpy_async.md) overloads | |
Oops, something went wrong.