You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I also had to add an additional flag as well while building sycl_gemv:
-DDpctl_DIR=<DPCTL_DIR>/cmake
Sample reproducer:
SYCL_PI_TRACE=1python3-c'import dpctl; import dpctl.tensor as dpt; import numpy as np; from sycl_gemm import gemv; q = dpctl.SyclQueue(); Mnp, vnp = np.random.randn(5, 3), np.random.randn(3); M = dpt.asarray(Mnp, sycl_queue=q); v = dpt.asarray(vnp, sycl_queue=q); r = dpt.empty((5,), dtype=v.dtype, sycl_queue=q); hev, ev = gemv(q, M, v, r, []); hev.wait(); rnp = dpt.asnumpy(r);'
While executing this, it failed with:
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_level_zero.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_cuda.so [ PluginVersion: 15.49.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]: platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]: device: NVIDIA A100 80GB PCIe
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]: platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]: device: NVIDIA A100 80GB PCIe
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]: platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]: device: NVIDIA A100 80GB PCIe
Traceback (most recent call last):
File "<string>", line 1, in <module>
RuntimeError: Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Coming back to the source which is invoked, the failure happens when executing the following code(github):
if (v_typenum == api.UAR_DOUBLE_) {
using T = double;
sycl::event gemv_ev = oneapi::mkl::blas::row_major::gemv(
q, oneapi::mkl::transpose::nontrans, n, m, T(1),
reinterpret_cast<T *>(mat_typeless_ptr), m,
reinterpret_cast<T *>(v_typeless_ptr), 1, T(0),
reinterpret_cast<T *>(r_typeless_ptr), 1, depends);
res_ev = gemv_ev;
}
> python -m dpctl --full-list 1s
Platform 0 ::
Name Intel(R) OpenCL
Version OpenCL 3.0 LINUX
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Version 2024.18.7.0.11_160000
Filter string opencl:cpu:0
Platform 1 ::
Name NVIDIA CUDA BACKEND
Version CUDA 12.5
Vendor NVIDIA Corporation
Backend ext_oneapi_cuda
Num Devices 1
# 0
Name NVIDIA A100 80GB PCIe
Version CUDA 12.5
Filter string cuda:gpu:0
The text was updated successfully, but these errors were encountered:
This example in DPCTL is written to be built with oneAPI MKL library (https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html) . The BLAS portion of this library provides implementations for x86-64 CPUs and for SPIR-capable devices. In particular, the library does not contain offload sections for Nvidia GPUs and for AMD GPUs.
The oneMKL interface library, https://github.com/oneapi-src/oneMKL, is C++ library that uses oneAPI MKL library for CPU and SPIR devices, and cuBLAS/cuSOLVER for NVidia GPUs, and rocBLAS/rocSOLVER for AMD GPUs. It need to be built, and I'd refer to the poster material and documentation for more details.
It is a good idea to provide references to said material in the README of this dpctl example though! Thanks for the suggestion
Hi, I'm trying to build the pybind11 extension mentioned under onemkl_gemv example DPCTL build with CUDA:
https://github.com/IntelPython/dpctl/tree/master/examples/pybind11/onemkl_gemv
Example mentioned fails to run all test cases:
The build works with the following changes, but some tests are still failing:
I also had to add an additional flag as well while building
sycl_gemv
:-DDpctl_DIR=<DPCTL_DIR>/cmake
Sample reproducer:
While executing this, it failed with:
Coming back to the source which is invoked, the failure happens when executing the following code(github):
... and SYCL_PI_TRACE=-1 reported:
python -m dpctl --full-list
report the following:The text was updated successfully, but these errors were encountered: