-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding the cause for ZE_RESULT_ERROR_UNINITIALIZED #140
Comments
Is this with the very latest kernel? I.e. does this help:
|
Having debugged several of these issues, I think this is rather important bug...
Or there being some mismatch between user-space and kernel driver:
Or frontend implementing
Looking at current Level-Zero frontend sources, it returns "unitialized" error for
Using |
I'm working on oneAPI.jl, which provides Julia support for Intel GPUs through Level Zero. Occasionally, we run into users reporting that they run into an opaque
ZE_RESULT_ERROR_UNINITIALIZED
when we callzeInit
during loading of oneAPI.jl. This is an unhelpful error, and it makes it impossible to use the Level Zero APIs to figure out what's actually happening. For example, I've run into:/dev/dri
libze_loader
vs systemlibze_tracing_layer
)Apart from the last one, I wouldn't expect the loader to fail to initialize, but still allow iterating drivers (why else this abstraction?) and ideally being able to determine why there's no devices. Currently, we typically find this out after a painstaking remote debugging session using
strace
orLD_DEBUG
.Am I missing something in the API here? CUDA for example has error codes that indicate at least a little better what may be happening happening (
CUDA_ERROR_NO_DEVICE
,CUDA_ERROR_DEVICE_UNAVAILABLE
,CUDA_ERROR_DEVICE_NOT_LICENSED
, etc).Apart from the above API issue, I also have a concrete case where a user's system keeps on throwing
ZE_RESULT_ERROR_UNINITIALIZED
: JuliaGPU/oneAPI.jl#399.LD_DEBUG
reveals that the correct libraries are found, andstrace
shows that/dev/dri
nodes are successfully discovered and opened.I've found out about some environment variables to increase logging, but the output isn't very helpful:
Any other suggestions on how to debug this would be much appreciated.
The text was updated successfully, but these errors were encountered: