Skip to content

Commit

Permalink
[SYCL][Graph] Add testing for sycl_ext_oneapi_local_memory (intel#16379)
Browse files Browse the repository at this point in the history
- Adds testing that verifies the interaction between
sycl_ext_oneapi_local_memory and sycl_ext_oneapi_graph.
- Reorder the extensions in the sycl graph spec to be listed in
alphabetical order.
- Explicitly state in the sycl graph spec that using
sycl_ext_oneapi_local_memory is supported.
  • Loading branch information
fabiomestre authored Dec 17, 2024
1 parent 839f0af commit 2a97b05
Show file tree
Hide file tree
Showing 4 changed files with 168 additions and 69 deletions.
144 changes: 75 additions & 69 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2075,6 +2075,45 @@ recording mode, as opposed to throwing.
This section defines the interaction of `sycl_ext_oneapi_graph` with other
extensions.

==== sycl_ext_codeplay_enqueue_native_command

`ext_codeplay_enqueue_native_command`, defined in
link:../experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc[sycl_ext_codeplay_enqueue_native_command]
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.

==== sycl_ext_intel_queue_index

The compute index queue property defined by
link:../supported/sycl_ext_intel_queue_index.asciidoc[sycl_ext_intel_queue_index]
is ignored during queue recording.

Using this information is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.

==== sycl_ext_oneapi_bindless_images

The new handler methods, and queue shortcuts, defined by
link:../experimental/sycl_ext_oneapi_bindless_images.asciidoc[sycl_ext_oneapi_bindless_images]
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.

==== sycl_ext_oneapi_device_global

The new handler methods, and queue shortcuts, defined by
link:../experimental/sycl_ext_oneapi_device_global.asciidoc[sycl_ext_oneapi_device_global].
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.

==== sycl_ext_oneapi_discard_queue_events

When recording a `sycl::queue` which has been created with the
Expand Down Expand Up @@ -2108,37 +2147,25 @@ nodes that are recorded from multiple queues and/or added by the explicit API:
* The only commands which have an implicit dependency on the barrier command
are those recorded from the same queue the barrier command was submitted to.

==== sycl_ext_oneapi_memcpy2d

The new handler methods, and queue shortcuts, defined by
link:../supported/sycl_ext_oneapi_memcpy2d.asciidoc[sycl_ext_oneapi_memcpy2d]
cannot be used in graph nodes. A synchronous exception will be thrown with
error code `invalid` if a user tries to add them to a graph.

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.

==== sycl_ext_oneapi_queue_priority

The queue priority property defined by
link:../supported/sycl_ext_oneapi_queue_priority.asciidoc[sycl_ext_oneapi_queue_priority]
is ignored during queue recording.
==== sycl_ext_oneapi_enqueue_functions

==== sycl_ext_oneapi_queue_empty
The command submission functions defined in
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[sycl_ext_oneapi_enqueue_functions]
can be used adding nodes to a graph when creating a graph from queue recording.
New methods are also defined that enable submitting an executable graph,
e.g. directly to a queue without returning an event.

The `queue::ext_oneapi_empty()` query defined by the
link:../supported/sycl_ext_oneapi_queue_empty.asciidoc[sycl_ext_oneapi_queue_empty]
extension behaves as normal during queue recording and is not captured to the graph.
Recorded commands are not counted as submitted for the purposes of this query.
==== sycl_ext_oneapi_free_function_kernels

==== sycl_ext_intel_queue_index
`sycl_ext_oneapi_free_function_kernels`, defined in
link:../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc[sycl_ext_oneapi_free_function_kernels]
can be used with SYCL Graphs.

The compute index queue property defined by
link:../supported/sycl_ext_intel_queue_index.asciidoc[sycl_ext_intel_queue_index]
is ignored during queue recording.
==== sycl_ext_oneapi_kernel_compiler_spirv

Using this information is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.
The kernels loaded using
link:../experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc[sycl_ext_oneapi_kernel_compiler_spirv]
behave as normal when used in graph nodes.

==== sycl_ext_oneapi_kernel_properties

Expand All @@ -2147,62 +2174,41 @@ link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[sycl_ext_oneapi_
can be used in graph nodes in the same way as they are used in normal queue
submission.

==== sycl_ext_oneapi_prod
==== sycl_ext_oneapi_local_memory

The new `sycl::queue::ext_oneapi_prod()` method added by
link:../proposed/sycl_ext_oneapi_prod.asciidoc[sycl_ext_oneapi_prod]
behaves as normal during queue recording and is not captured to the graph.
Recorded commands are not counted as submitted for the purposes of its operation.
Allocating local memory inside a graph kernel node with `sycl::ext::oneapi::group_local_memory()` or
`sycl::ext::oneapi::group_local_memory_for_overwrite()` is supported. These methods are defined by
link:../supported/sycl_ext_oneapi_local_memory.asciidoc[sycl_ext_oneapi_local_memory.]

==== sycl_ext_oneapi_device_global

The new handler methods, and queue shortcuts, defined by
link:../experimental/sycl_ext_oneapi_device_global.asciidoc[sycl_ext_oneapi_device_global].
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.

==== sycl_ext_oneapi_bindless_images
==== sycl_ext_oneapi_memcpy2d

The new handler methods, and queue shortcuts, defined by
link:../experimental/sycl_ext_oneapi_bindless_images.asciidoc[sycl_ext_oneapi_bindless_images]
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.
link:../supported/sycl_ext_oneapi_memcpy2d.asciidoc[sycl_ext_oneapi_memcpy2d]
cannot be used in graph nodes. A synchronous exception will be thrown with
error code `invalid` if a user tries to add them to a graph.

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.

==== sycl_ext_oneapi_kernel_compiler_spirv

The kernels loaded using
link:../experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc[sycl_ext_oneapi_kernel_compiler_spirv]
behave as normal when used in graph nodes.

==== sycl_ext_codeplay_enqueue_native_command

`ext_codeplay_enqueue_native_command`, defined in
link:../experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc[sycl_ext_codeplay_enqueue_native_command]
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.
==== sycl_ext_oneapi_prod

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.
The new `sycl::queue::ext_oneapi_prod()` method added by
link:../proposed/sycl_ext_oneapi_prod.asciidoc[sycl_ext_oneapi_prod]
behaves as normal during queue recording and is not captured to the graph.
Recorded commands are not counted as submitted for the purposes of its operation.

==== sycl_ext_oneapi_enqueue_functions
==== sycl_ext_oneapi_queue_empty

The command submission functions defined in
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[sycl_ext_oneapi_enqueue_functions]
can be used adding nodes to a graph when creating a graph from queue recording.
New methods are also defined that enable submitting an executable graph,
e.g. directly to a queue without returning an event.
The `queue::ext_oneapi_empty()` query defined by the
link:../supported/sycl_ext_oneapi_queue_empty.asciidoc[sycl_ext_oneapi_queue_empty]
extension behaves as normal during queue recording and is not captured to the graph.
Recorded commands are not counted as submitted for the purposes of this query.

==== sycl_ext_oneapi_free_function_kernels
==== sycl_ext_oneapi_queue_priority

`sycl_ext_oneapi_free_function_kernels`, defined in
link:../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc[sycl_ext_oneapi_free_function_kernels]
can be used with SYCL Graphs.
The queue priority property defined by
link:../supported/sycl_ext_oneapi_queue_priority.asciidoc[sycl_ext_oneapi_queue_priority]
is ignored during queue recording.

==== sycl_ext_oneapi_work_group_memory

Expand Down
10 changes: 10 additions & 0 deletions sycl/test-e2e/Graph/Explicit/compile_time_local_memory.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
// Extra run to check for immediate-command-list in Level Zero
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/compile_time_local_memory.cpp"
73 changes: 73 additions & 0 deletions sycl/test-e2e/Graph/Inputs/compile_time_local_memory.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
// Tests adding kernel nodes with local memory that is allocated using
// the sycl_ext_oneapi_local_memory extension.

#include "../graph_common.hpp"
#include <sycl/ext/oneapi/group_local_memory.hpp>

int main() {
queue Queue{};

using T = int;
constexpr size_t LocalSize = 128;

std::vector<T> HostData(Size);
std::iota(HostData.begin(), HostData.end(), 10);

exp_ext::command_graph Graph{Queue.get_context(), Queue.get_device()};

T *PtrA = malloc_device<T>(Size, Queue);

Queue.copy(HostData.data(), PtrA, Size);
Queue.wait_and_throw();

auto NodeA = add_node(Graph, Queue, [&](handler &CGH) {
CGH.parallel_for(nd_range({Size}, {LocalSize}), [=](nd_item<1> Item) {
multi_ptr<size_t[LocalSize], access::address_space::local_space>
LocalMem = sycl::ext::oneapi::group_local_memory<size_t[LocalSize]>(
Item.get_group());
*LocalMem[Item.get_local_linear_id()] = Item.get_global_linear_id() * 2;
PtrA[Item.get_global_linear_id()] +=
*LocalMem[Item.get_local_linear_id()];
});
});

add_node(
Graph, Queue,
[&](handler &CGH) {
depends_on_helper(CGH, NodeA);
CGH.parallel_for(nd_range({Size}, {LocalSize}), [=](nd_item<1> Item) {
multi_ptr<size_t[LocalSize], access::address_space::local_space>
LocalMem = sycl::ext::oneapi::group_local_memory_for_overwrite<
size_t[LocalSize]>(Item.get_group());
*LocalMem[Item.get_local_linear_id()] =
Item.get_global_linear_id() + 4;
PtrA[Item.get_global_linear_id()] *=
*LocalMem[Item.get_local_linear_id()];
});
},
NodeA);

auto GraphExec = Graph.finalize();

for (unsigned n = 0; n < Iterations; n++) {
Queue.submit([&](handler &CGH) { CGH.ext_oneapi_graph(GraphExec); });
}

Queue.wait_and_throw();

Queue.copy(PtrA, HostData.data(), Size);
Queue.wait_and_throw();

free(PtrA, Queue);

for (size_t i = 0; i < Size; i++) {
T Ref = 10 + i;
for (size_t iter = 0; iter < Iterations; ++iter) {
Ref += (i * 2);
Ref *= (i + 4);
}
assert(check_value(i, Ref, HostData[i], "PtrA"));
}

return 0;
}
10 changes: 10 additions & 0 deletions sycl/test-e2e/Graph/RecordReplay/compile_time_local_memory.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
// Extra run to check for immediate-command-list in Level Zero
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/compile_time_local_memory.cpp"

0 comments on commit 2a97b05

Please sign in to comment.