Copyright © 2024 Intel Corporation. All rights reserved.
Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.
To report problems with this extension, please open a new issue at:
This extension is written against the SYCL 2020 revision 8 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision.
This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this specification are implemented in DPC++, but they are not finalized and may change incompatibly in future versions of DPC++ without prior notice. Shipping software products should not rely on APIs defined in this specification.
This extension is currently implemented in DPC++ only for GPU devices and only when using the CUDA or HIP backends. Attempting to use this extension in kernels that run on other devices or backends may result in undefined behavior. Be aware that the compiler is not able to issue a diagnostic to warn you if this happens.
This extension is derived from the experimental AdaptiveCpp extension,
enqueue_custom_operation
which is documented
here.
The goal of ext_codeplay_enqueue_native_command
is to integrate interop
work within the SYCL runtime’s creation of the asynchronous SYCL DAG. As such,
the user defined lambda must only enqueue asynchronous, as opposed to
synchronous, backend work within the user lambda. Asynchronous work must only
be submitted to the native queue obtained from
interop_handle::get_native_queue
.
A callable submitted to ext_codeplay_enqueue_native_command
won’t wait
on its dependent events to execute. The dependencies passed to an
ext_codeplay_enqueue_native_command
submission will result in dependencies being
implicitly handled in the backend API, using the native queue object associated
with the SYCL queue that the sycl_ext_codeplay_enqueue_native_command
is
submitted to. This gives different synchronization guarantees from normal SYCL
host_task
s, which guarantee that the host_task
callable will only begin
execution once all of its dependent events have completed.
In this example:
q.submit([&](sycl::handler &cgh) {
cgh.depends_on(dep_event);
cgh.ext_codeplay_enqueue_native_command([=](sycl::interop_handle &h) {
printf("This will print before dep_event has completed.\n");
// This stream has been synchronized with dep_event's underlying
// hipEvent_t
hipStream_t stream = h.get_native_queue<sycl::backend::hip>();
hipMemcpyAsync(target_ptr, native_mem, test_size * sizeof(int),
hipMemcpyDeviceToHost, stream);
});
});
q.wait()
The print statement may print before dep_event
has completed. However, the
asynchronous memcpy submitted to the native queue obtained by
interop_handle::get_native_queue
is guaranteed to have the correct
dependencies, and therefore will only start once its dependent events have
completed.
By contrast, when using a host_task
, it is guaranteed that the print statement
will only happen once the host task’s dependent events are observed to be
complete on the host.
A SYCL event returned by a submission of a
ext_codeplay_enqueue_native_command
command is only complete once the
asynchronous work enqueued to the native queue obtained through
interop_handle::get_native_queue()
has completed.
This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND
to one of the values defined
in the table below. Applications can test for the existence of this macro to
determine if the implementation supports this feature, or applications can test
the macro’s value to determine which of the extension’s features the
implementation supports.
Value | Description |
---|---|
1 |
The APIs of this experimental extension are not versioned, so the feature-test macro always has this value. |
This extension adds the following new member function to the SYCL handler
class:
class handler {
template <typename Func>
void ext_codeplay_enqueue_native_command(Func&& interopCallable);
};
Constraints: The Func
must a C++ callable object which takes a single
parameter of type interop_handle
.
Effects: The interopCallable
object is called exactly once, and this call
may be made asynchronously even after the calling thread returns from
ext_codeplay_enqueue_native_command
.
The call to interopCallable
may submit one or more asynchronous tasks to the
native backend object obtained from interop_handle::get_native_queue
, and
these tasks become encapsulated in a SYCL command that is added to the queue.
If the enclosing command group has any dependencies, these dependencies are
propagated to the native asynchronous tasks. This happens, for example, if the
command group calls handler::depends_on
or if it constructs an accessor. As a
result, there is typically no need to specify these dependencies through native
APIs. Note, however, that these dependencies are associated with the
asynchronous tasks submitted by interopCallable
, not the call to
interopCallable
. The call to interopCallable
may happen even before the
dependencies are satisfied.
The SYCL command described above completes once all of the native asynchronous tasks it contains have completed.
The call to interopCallable
must not submit any synchronous tasks to the
native backend object, and it must not block waiting for any tasks to complete.
The call also must not add tasks to backend objects that underly any other
queue, aside from the queue that is associated with this handler. If it does
any of these things, the behavior is undefined.
This example demonstrates how to use this extension to enqueue asynchronous native tasks on the HIP backend.
sycl::queue q;
q.submit([&](sycl::handler &cgh) {
sycl::accessor acc{buf, cgh};
cgh.ext_codeplay_enqueue_native_command([=](sycl::interop_handle &h) {
// Can extract device pointers from accessors
void *native_mem = h.get_native_mem<sycl::backend::hip>(acc);
// Can extract stream
hipStream_t stream = h.get_native_queue<sycl::backend::hip>();
// Can enqueue arbitrary backend operations. This could also be a kernel
// launch or call to a library that enqueues operations on the stream etc
//
// Important: Enqueuing a *synchronous* backend operation results in
// undefined behavior.
hipMemcpyAsync(target_ptr, native_mem, test_size * sizeof(int),
hipMemcpyDeviceToHost, stream);
});
});
q.wait();