-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) #15203
Comments
Hi @MajidAbdelilah! What about other devices? Like cpu or opencl:gpu (I guess you're running it on level_zero:gpu, right?). Could you also attach the output of |
|
when running on the cpu using cpu_selector_v it is working without runtime errors
for SYCL_PI_TRACE=-1 ./a.out i have attached the stdout, stderr, and allstd. |
Yes, but it looks like @MajidAbdelilah is using 2024.1.0, so I guess there should still be the PI. I tried this with latest intel/llvm. It passes on opencl:cpu, but fails on both level-zero and opencl gpu with:
Adding "confirmed" label for runtime team to take a look. FYI - this code looks very complicated, although it can be made much simpler like: #include <sycl/sycl.hpp>
#include <iostream>
#include <cmath>
int main() {
float m1[16] = {2, 2, 9, 1, 8, 3, 2, 4, 2, 5, 6, 7, 3, 5, 6, 4};
float m2[16] = {2, 2, 91, 1, 8, 3, 21, 4, 2, 5, 6, 0, 31, 5, 6, 4};
sycl::queue q;
float *device_arr = sycl::malloc_device<float>(4 * 4, q);
float *device_m1 = sycl::malloc_device<float>(4 * 4, q);
float *device_m2 = sycl::malloc_device<float>(4 * 4, q);
q.memcpy(device_m1, m1, sizeof(float) * 4 * 4);
q.memcpy(device_m2, m2, sizeof(float) * 4 * 4);
q.wait();
const unsigned long work_size = 4 * 4;
q.parallel_for(sycl::range<1>(work_size), [=](sycl::id<1> idx) {
device_arr[idx] = 0;
int i = idx / 4;
int j = idx % 4;
for (int k = 0; k < 4; k++) {
device_arr[idx] += device_m1[i * 4 + k] * device_m2[k * 4 + j];
}
}).wait();
clock_t begin_sycl = clock();
clock_t end_sycl = clock();
double time_spent_sycl = (double)(end_sycl - begin_sycl) / CLOCKS_PER_SEC / 12;
std::cout << "Time spent on SYCL: " << time_spent_sycl << std::endl;
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++)
std::cout << device_arr[i * 4 + j] << " ";
std::cout << std::endl;
}
sycl::free(device_arr, q);
sycl::free(device_m1, q);
sycl::free(device_m2, q);
} |
@KornevNikita , your reproducer has a bug when printing the result on the host side - @MajidAbdelilah your code has undefined behavior (UB) in it. |
Oops, forgot to delete this block. Thanks! |
Describe the bug
the program failed with this output
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
i have an intel core I7 1255U and no dedicated graphics card the bug happen when running on the intel integrated gpu
To reproduce
#include <sycl/sycl.hpp>
#include <math.h>
#include
int main()
{
}
dpcpp -xhost c++\ vs\ sycl/sycl.cpp -O3
Environment
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.1/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.1/bin/compiler/../icpx.cfg
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-1255U OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [24.26.30049]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.30049]
Platforms: 4
Platform [#1]:
Version : OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Name : Intel(R) FPGA Emulation Platform for OpenCL(TM)
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : acc
Version : OpenCL 1.2
Name : Intel(R) FPGA Emulation Device
Vendor : Intel(R) Corporation
Driver : 2024.17.3.0.08_160000
Aspects : accelerator fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_atomic_host_allocations usm_atomic_shared_allocations ext_oneapi_srgb ext_oneapi_non_uniform_groups
info::device::sub_group_sizes: 4 8 16 32 64
Platform [#2]:
Version : OpenCL 3.0 LINUX
Name : Intel(R) OpenCL
Vendor : Intel(R) Corporation
Devices : 1
Device [#1]:
Type : cpu
Version : OpenCL 3.0 (Build 0)
Name : 12th Gen Intel(R) Core(TM) i7-1255U
Vendor : Intel(R) Corporation
Driver : 2024.17.3.0.08_160000
Aspects : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_non_uniform_groups
info::device::sub_group_sizes: 4 8 16 32 64
Platform [#3]:
Version : OpenCL 3.0
Name : Intel(R) OpenCL Graphics
Vendor : Intel(R) Corporation
Devices : 1
Device [#2]:
Type : gpu
Version : OpenCL 3.0 NEO
Name : Intel(R) Iris(R) Xe Graphics
Vendor : Intel(R) Corporation
Driver : 24.26.30049
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_oneapi_srgb ext_intel_device_id ext_intel_legacy_image ext_intel_esimd ext_oneapi_non_uniform_groups
info::device::sub_group_sizes: 8 16 32
Platform [#4]:
Version : 1.3
Name : Intel(R) Level-Zero
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 1.3
Name : Intel(R) Iris(R) Xe Graphics
Vendor : Intel(R) Corporation
Driver : 1.3.30049
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_intel_esimd ext_oneapi_non_uniform_groups
info::device::sub_group_sizes: 8 16 32
default_selector() : gpu, Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.30049]
accelerator_selector() : acc, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000]
cpu_selector() : cpu, Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-1255U OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
gpu_selector() : gpu, Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.30049]
custom_selector(gpu) : gpu, Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.30049]
custom_selector(cpu) : cpu, Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-1255U OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
custom_selector(acc) : acc, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000]
[abdelilah@archlinux learn SYCL]$```
Additional context
No response
The text was updated successfully, but these errors were encountered: