Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Commit

Permalink
Add extended API documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
c0riolis committed Nov 9, 2020
1 parent 7ff2a83 commit 4ebc664
Show file tree
Hide file tree
Showing 29 changed files with 986 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
has_children: true
has_toc: true
nav_order: 4
nav_order: 5
---

# Contributing
Expand Down
29 changes: 29 additions & 0 deletions docs/extended_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
has_children: true
has_toc: false
nav_order: 3
---

# Extended API

## [Headers](./extended_api/headers.md)

### [\<cuda/pipeline>](./extended_api/headers/pipeline.md)

## [Concepts](./extended_api/concepts.md)

### [Group](./extended_api/concepts/group.md)

### [Shape](./extended_api/concepts/shape.md)

## [Synchronization library](./extended_api/synchronization_library.md)

### [pipeline](./extended_api/synchronization_library/pipeline.md)

### [pipeline_shared_state](./extended_api/synchronization_library/pipeline_shared_state.md)

## [Asynchronous operations library](./extended_api/asynchronous_operations_library.md)

### [aligned_size_t](./extended_api/asynchronous_operations_library/aligned_size_t.md)

### [memcpy_async](./extended_api/asynchronous_operations_library/memcpy_async.md)
17 changes: 17 additions & 0 deletions docs/extended_api/asynchronous_operations_library.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
parent: Extended API
has_children: true
has_toc: false
---

# Asynchronous operations library

The asynchronous operations library provides components for asynchronous data movement.

## Shapes

| [aligned_size_t](./asynchronous_operations_library/aligned_size_t.md) | defines an extent of bytes with a statically defined alignment `(class template)` |

## Asynchronous operations

| [memcpy_async](./asynchronous_operations_library/memcpy_async.md) | asynchronously copies one buffer to another `(function template)` |
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
grand_parent: Extended API
parent: Asynchronous operations library
---

# cuda::**aligned_size_t**

Defined in header [`<cuda/barrier>`](../headers/barrier.md)

Defined in header [`<cuda/pipeline>`](../headers/pipeline.md)

```c++
template<size_t Alignment>
struct aligned_size_t;
```

The class template `cuda::aligned_size_t` is a _shape_ representing an extent of bytes with a statically defined (address and size) alignment.

## Template parameters

| Alignment | the address & size alignement of the byte extent |

## Data members

| [align](./aligned_size_t/align.md) | the alignment of the byte extent |
| [value](./aligned_size_t/value.md) | the size of the byte extent |

## Member functions

| [(constructor)](./aligned_size_t/constructor.md) | constructs an _aligned size_ |
| (destructor) [implicitly declared] | trivial implicit destructor |
| operator= [implicitly declared] | trivial implicit copy/move assignment |
| operator size_t | implicit conversion to [`size_t`](https://en.cppreference.com/w/cpp/types/size_t) |

## Notes

If `value` is not a multiple of `align` the behavior is undefined.

If `Alignment` is not a [valid alignment](https://en.cppreference.com/w/c/language/object#Alignment) the behavior is undefined.

## Example

```c++
#include <cuda/barrier>

__global__ void example_kernel(void * dst, void * src, size_t size)
{
cuda::barrier<cuda::thread_scope_system> barrier;
init(&barrier, 1);

// Implementation cannot make assumptions about alignment
cuda::memcpy_async(dst, src, size, barrier);

// Implementation can assume that dst, src and size are 16-bytes aligned and may optimize accordingly
cuda::memcpy_async(dst, src, cuda::aligned_size_t<16>(size), barrier);

barrier.arrive_and_wait();
}
```
[See it on Godbolt](https://godbolt.org/z/v7Ev9E){: .btn }
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
nav_exclude: true
---

# cuda::aligned_size_t\<Alignment>::**align**

```c++
static constexpr size_t align = Alignment;
```

Represents the alignment (address and size) of the byte extent.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
nav_exclude: true
---

# cuda::aligned_size_t\<Alignment>::**aligned_size_t**

```c++
explicit aligned_size_t(size_t size);
```
Constructs an `aligned_size_t` _shape_.
## Notes
If `size` is not a multiple of `Alignment` the behavior is undefined.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
nav_exclude: true
---

# cuda::aligned_size_t\<Alignment>::**value**

```c++
size_t value;
```

Represents the size of the byte extent.
64 changes: 64 additions & 0 deletions docs/extended_api/asynchronous_operations_library/memcpy_async.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
grand_parent: Extended API
parent: Asynchronous operations library
---

# cuda::**memcpy_async**

Defined in header [`<cuda/barrier>`](../../api/synchronization_library/barrier.md)

```c++
template<typename Shape, thread_scope Scope>
void memcpy_async(void * destination, void const * source, Shape size, barrier<Scope> & barrier); // (1)

template<typename Group, typename Shape, thread_scope Scope>
void memcpy_async(Group const & group, void * destination, void const * source, Shape size, barrier<Scope> & barrier); // (2)
```
Defined in header [`<cuda/pipeline>`](../headers/pipeline.md)
```c++
template<typename Shape, thread_scope Scope>
void memcpy_async(void * destination, void const * source, Shape size, pipeline<Scope> & pipeline); // (3)
template<typename Group, typename Shape, thread_scope Scope>
void memcpy_async(Group const & group, void * destination, void const * source, Shape size, pipeline<Scope> & pipeline); // (4)
```

Asynchronously copies `size` bytes from the memory location pointed to by `source` to the memory location pointed to by `destination`.
Both objects are reinterpreted as arrays of `unsigned char`.

`cuda::memcpy_async` have similar constraints to [`std::memcpy`](https://en.cppreference.com/w/cpp/string/byte/memcpy), namely:
* If the objects overlap, the behavior is undefined.
* If either `destination` or `source` is an invalid or null pointer, the behavior is undefined (even if `count` is zero).
* If the objects are potentially-overlapping or not [`TriviallyCopyable`](https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable),
the behavior is undefined.

1. Binds the asynchronous copy completion to `barrier` and issues the copy in the current thread.
2. Binds the asynchronous copy completion to `barrier` and cooperatively issues the copy across all threads in `group`.
3. Binds the asynchronous copy completion to `pipeline` and issues the copy in the current thread
4. Binds the asynchronous copy completion to `pipeline` and cooperatively issues the copy across all threads in `group`.

## Template parameters

| Group | a type satisfying the [_group concept_](../concepts/group.md) |
| Shape | a type satisfying the [_shape concept_](../concepts/shape.md) (see [`size_t`](https://en.cppreference.com/w/c/types/size_t) and [`cuda::aligned_size_t`](./aligned_size_t.md)) |

## Parameters

| group | the group of threads |
| destination | pointer to the memory location to copy to |
| source | pointer to the memory location to copy from |
| size | the number of bytes to copy |
| barrier | the barrier object used to wait on the copy completion |
| pipeline | the pipeline object used to wait on the copy completion |

## Notes

The implementation may perform the copy synchronously in some implementation defined cases.

## Example

```c++
TODO
```
11 changes: 11 additions & 0 deletions docs/extended_api/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
parent: Extended API
has_children: true
has_toc: false
nav_order: 1
---

# Concepts

| [Group](./concepts/group.md) | defines the requirements of a type that represents a group of cooperating threads |
| [Shape](./concepts/shape.md) | defines the requirements of a type that represents a byte extent with a particular memory layout |
46 changes: 46 additions & 0 deletions docs/extended_api/concepts/group.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
grand_parent: Extended API
parent: Concepts
---

# Group

```c++
struct Group {
static constexpr cuda::thread_scope thread_scope = /*implementation-defined*/;
integral size() const;
integral thread_rank() const;
void sync() const;
};
```
The _Group concept_ defines the requirements of a type that represents a group of cooperating threads.
## Data members
| thread_scope | the scope at which `Group::sync()` synchronizes memory operations |
## Member functions
| size | returns the number of participating threads |
| thread_rank | returns a unique value for each participating thread (`0 <= Group::thread_rank() < Group::size()`) |
| sync | synchronizes the participating threads |
## Notes
This concept is defined for documentation purposes but is not materialized in the library.
## Example
```c++
#include <cuda/atomic>
struct single_thread_group {
static constexpr cuda::thread_scope thread_scope = cuda::thread_scope::thread_scope_thread;
size_t size() const { return 1; }
size_t thread_rank() const { return 1; }
void sync() const { }
};
```

[See it on Godbolt](https://godbolt.org/z/453r3s){: .btn }
37 changes: 37 additions & 0 deletions docs/extended_api/concepts/shape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
grand_parent: Extended API
parent: Concepts
---

# Shape

```c++
struct Shape {
operator size_t const;
};
```
The _Shape concept_ defines the requirements of a type that represents a byte extent with a particular memory layout.
## Member functions
| operator size_t | implicit conversion to [`size_t`](https://en.cppreference.com/w/cpp/types/size_t) |
## Notes
This concept is defined for documentation purposes but is not materialized in the library.
## Example
```c++
// A size that carries an alignment hint
template <size_t Align>
struct aligned_size {
static constexpr size_t align = Align;
size_t size;
aligned_size(size_t s) : size(s) {}
operator size_t() const { return size; }
};
```

[See it on Godbolt](https://godbolt.org/z/hbajKo){: .btn }
12 changes: 12 additions & 0 deletions docs/extended_api/headers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
parent: Extended API
has_children: true
has_toc: false
nav_order: 0
---

# Headers

## Synchronization library

| [\<pipeline\>](./headers/pipeline.md) | [Pipelines](./synchronization_library/pipeline.md) and corresponding [memcpy_async](./asynchronous_operations_library/memcpy_async.md) overloads |
Loading

0 comments on commit 4ebc664

Please sign in to comment.