Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to import #399

Closed
csantosb opened this issue Mar 19, 2024 · 11 comments
Closed

Fail to import #399

csantosb opened this issue Mar 19, 2024 · 11 comments

Comments

@csantosb
Copy link

csantosb commented Mar 19, 2024

When I try to using oneAPI (oneAPI v1.4.0) I get the following message.

┌ Error: Failed to initialize oneAPI
│   exception =
│    ZeError: driver is not initialized (code 2013265921, ZE_RESULT_ERROR_UNINITIALIZED)
│    Stacktrace:
│      [1] throw_api_error(res::oneAPI.oneL0._ze_result_t)
│        @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:8
│      [2] check
│        @ ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/libze.jl:19 [inlined]
│      [3] zeInit
│        @ ~/.julia/packages/oneAPI/2gxUb/lib/utils/call.jl:24 [inlined]
│      [4] __init__()
│        @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:100
│      [5] run_module_init(mod::Module, i::Int64)
│        @ Base ./loading.jl:1134
│      [6] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
│        @ Base ./loading.jl:1122
│      [7] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
│        @ Base ./loading.jl:1067
│      [8] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)
│        @ Base ./loading.jl:1581
│      [9] _require(pkg::Base.PkgId, env::String)
│        @ Base ./loading.jl:1938
│     [10] __require_prelocked(uuidkey::Base.PkgId, env::String)
│        @ Base ./loading.jl:1812
│     [11] #invoke_in_world#3
│        @ ./essentials.jl:926 [inlined]
│     [12] invoke_in_world
│        @ ./essentials.jl:923 [inlined]
│     [13] _require_prelocked(uuidkey::Base.PkgId, env::String)
│        @ Base ./loading.jl:1803
│     [14] macro expansion
│        @ ./loading.jl:1790 [inlined]
│     [15] macro expansion
│        @ ./lock.jl:267 [inlined]
│     [16] __require(into::Module, mod::Symbol)
│        @ Base ./loading.jl:1753
│     [17] #invoke_in_world#3
│        @ ./essentials.jl:926 [inlined]
│     [18] invoke_in_world
│        @ ./essentials.jl:923 [inlined]
│     [19] require(into::Module, mod::Symbol)
│        @ Base ./loading.jl:1746
│     [20] eval
│        @ ./boot.jl:385 [inlined]
│     [21] eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
│     [22] repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
│     [23] start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
│     [24] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
│     [25] run_repl(repl::REPL.AbstractREPL, consumer::Any)
│        @ REPL /tmp/julia-1.10.2/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
│     [26] (::Base.var"#1013#1015"{Bool, Bool, Bool})(REPL::Module)
│        @ Base ./client.jl:432
│     [27] #invokelatest#2
│        @ ./essentials.jl:892 [inlined]
│     [28] invokelatest
│        @ ./essentials.jl:889 [inlined]
│     [29] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
│        @ Base ./client.jl:416
│     [30] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:333
│     [31] _start()
│        @ Base ./client.jl:552
└ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/2gxUb/lib/level-zero/oneL0.jl:103

My versioninfo() is

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

and the output of my hwinfo --display gives

28: PCI 02.0: 0300 VGA compatible controller (VGA)              
  [Created at pci.386]
  Unique ID: _Znp.lIyCdeT3soB
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Model: "Intel TigerLake-LP GT2 [Iris Xe Graphics]"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x9a49 "TigerLake-LP GT2 [Iris Xe Graphics]"
  SubVendor: pci 0x1028 "Dell"
  SubDevice: pci 0x0a5c 
  Revision: 0x01
  Driver: "i915"
  Driver Modules: "i915"
  Memory Range: 0x6076000000-0x6076ffffff (rw,non-prefetchable)
  Memory Range: 0x4000000000-0x400fffffff (ro,non-prefetchable)
  I/O Ports: 0x3000-0x303f (rw)
  Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)
  IRQ: 183 (499382 events)
  Module Alias: "pci:v00008086d00009A49sv00001028sd00000A5Cbc03sc00i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Driver Info #1:
    Driver Status: xe is active
    Driver Activation Cmd: "modprobe xe"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

One more, inxi -Fzm gives me

Graphics:
  Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel
  Device-2: Microdia Integrated_Webcam_HD driver: uvcvideo type: USB
  Display: server: X.Org v: 21.1.11 driver: X: loaded: modesetting dri: iris
    gpu: i915 resolution: 1920x1080~60Hz
  API: EGL v: 1.5 drivers: iris,swrast platforms: x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.0.3-arch1.1
    renderer: Mesa Intel Xe Graphics (TGL GT2)
  API: Vulkan v: 1.3.279 drivers: intel,llvmpipe surfaces: xcb,xlib

Any idea ?

Thanks

@maleadt
Copy link
Member

maleadt commented Mar 20, 2024

Make sure you have the necessary permissions to access the GPU hardware in /dev/dri*. You didn't mention which kernel version you are using, is it sufficiently recent (see the README, I think you need 6.2 at the least)? You can also try straceing the process to see if anything goes wrong.

Apart from those suggestions though, there's not much we can do, the API being as opaque as it is. If you don't figure it out, it may be best to open an issue on https://github.com/intel/compute-runtime.

@csantosb
Copy link
Author

csantosb commented Mar 20, 2024

I'm using up to date archlinux, and kernel 6.8.1, with official julia binaries.

I'll try your suggestions, thanks. Not sure how to explain the issue upstream, though.

@maleadt
Copy link
Member

maleadt commented Mar 20, 2024

After fixing /dev permissions, please post a strace. Maybe we can see what's up in there.

And maybe also a run with LD_DEBUG=libs.

@csantosb
Copy link
Author

csantosb commented Mar 20, 2024 via email

@maleadt
Copy link
Member

maleadt commented Mar 20, 2024

It looks like you have some Level Zero things installed globally:

     42930:	find library=libze_tracing_layer.so.1 [0]; searching
     42930:	 search cache=/etc/ld.so.cache
     42930:	  trying file=/usr/lib/libze_tracing_layer.so.1
     42930:	
     42930:	
     42930:	calling init: /usr/lib/libze_tracing_layer.so.1
openat(AT_FDCWD, "/usr/lib/libze_tracing_layer.so.1", O_RDONLY|O_CLOEXEC) = 20

On my system, it's after that (normally failed) discovery that /dev/dri is scanned:

openat(AT_FDCWD, "/usr/lib/libze_tracing_layer.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
munmap(0x7ff195395000, 34618)           = 0
openat(AT_FDCWD, "/dev/dri/by-path", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 17

That doesn't happen in your strace, so I'd guess that the mixing of our libze with your system libze_validation makes the whole thing bail out early.

Could you try removing those system libraries, if only temporarily? Generally, the LD_DEBUG=libs output shouldn't be loading any system libraries (except for core ones like libc, libm, libpthread, etc).

@csantosb
Copy link
Author

csantosb commented Mar 21, 2024 via email

@maleadt
Copy link
Member

maleadt commented Mar 21, 2024

OK great, despite the error being the same we do actually see libze scanning /dev now, indicating that the tracing layer mismatch was problematic in the first place.

openat(AT_FDCWD, "/dev/dri/by-path", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 20
fstat(20, {st_mode=S_IFDIR|0755, st_size=80, ...}) = 0
getdents64(20, 0x365f5f0 /* 4 entries */, 32768) = 144
getdents64(20, 0x365f5f0 /* 0 entries */, 32768) = 0
close(20)                               = 0
openat(AT_FDCWD, "/dev/dri/by-path/pci-0000:00:02.0-render", O_RDWR) = 20
ioctl(20, DRM_IOCTL_VERSION, 0x7fff8e242e80) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e242ff0) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e242ff0) = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 21
fstat(21, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(21, 0x365f5f0 /* 5 entries */, 32768) = 144
getdents64(21, 0x365f5f0 /* 0 entries */, 32768) = 0
close(21)                               = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm/card1/prelim_uapi_version", O_RDONLY) = -1 ENOENT (Aucun fichier ou dossier de ce type)
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242eb0) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242eb0) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f80) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e243020) = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, 0x7fff8e243060) = -1 EINVAL (Argument invalide)
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f90) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f90) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f30) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242f30) = 0
futex(0x366a858, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(20, DRM_IOCTL_I915_GEM_VM_CREATE, 0x7fff8e242ff0) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242710) = 0
ioctl(20, DRM_IOCTL_I915_QUERY, 0x7fff8e242710) = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, 0x7fff8e242840) = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 21
fstat(21, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(21, 0x365f5f0 /* 5 entries */, 32768) = 144
getdents64(21, 0x365f5f0 /* 0 entries */, 32768) = 0
close(21)                               = 0
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:00:02.0/drm/card1/gt_max_freq_mhz", O_RDONLY) = 21
read(21, "1300\n", 8191)                = 5
close(21)                               = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, 0x7fff8e242830) = 0
ioctl(20, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, 0x7fff8e242850) = 0
ioctl(20, DRM_IOCTL_I915_GETPARAM, 0x7fff8e2427e0) = 0
readlink("/proc/self/exe", "/tmp/julia-1.10.2/bin/julia", 511) = 27
futex(0x36a3840, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(20, DRM_IOCTL_I915_GEM_VM_DESTROY, 0x7fff8e2431e0) = 0
close(20)                               = 0

I don't see anything stand out here. There's an DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM returning EINVAL, but more queries are made after that, so it doesn't seem fatal.

Maybe also try running with ZE_ENABLE_LOADER_DEBUG_TRACE= , according to https://github.com/oneapi-src/level-zero?tab=readme-ov-file#debug-trace.

❯ ZE_ENABLE_LOADER_DEBUG_TRACE=1 jl --project examples/vadd.jl
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Load Library of libze_tracing_layer.so.1 failed with libze_tracing_layer.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:check_drivers(flags=0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED))
ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_SUCCESS

@csantosb
Copy link
Author

csantosb commented Mar 21, 2024 via email

@maleadt
Copy link
Member

maleadt commented Mar 21, 2024

Maybe the two lines: ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(0(ZE_INIT_ALL_DRIVER_TYPES_ENABLED)) returning ZE_RESULT_ERROR_UNINITIALIZED ZE_LOADER_DEBUG_TRACE:Check Drivers Failed on libze_intel_gpu.so.1 , driver will be removed. zeInit failed with ZE_RESULT_ERROR_UNINITIALIZED

It does look like the issue is with the compute-runtime, providing libze_intel_gpu. Could you try running with NEOReadDebugKeys=1 PrintDebugMessages=1 PrintXeLogs=1? I'm not too familiar with compute-runtime's inner workings though; maybe @kballeda could suggest what else to try here. If not, I think we'll have to consider filing an issue upstream.

@csantosb
Copy link
Author

csantosb commented Mar 22, 2024 via email

@csantosb
Copy link
Author

csantosb commented Apr 1, 2024

Problem fixed for me after a system update.

Thanks a lot for your help !

@csantosb csantosb closed this as completed Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants