-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zesDeviceGetProperties(): Don't crash miserably when Sysman wasn't initialized #389
Conversation
…itialized Refs oneapi-src/level-zero#36 Signed-off-by: Brice Goglin <[email protected]>
Thanks @bgoglin for the patch. So without the patch, L0 segfaults? |
zesDeviceGetProperties() segfaults unless Sysman was initialized during zeInit(). Here's the code for reproducing this. It could be simpler but I basically just extracted what I was going to put in hwloc for listing L0 devices. We need a way to find out whether Sysman was enabled when zeInit() was first called in the process (hwloc may not always be the first one initializing L0). Returning an error from Sysman functions is an easy way to detect that case.
|
@bgoglin Thanks. Taking a closer look internally and will get back to you. |
@jandres742 Do you have a CI building deb packages? I tried to build from git manually in the past but couldn't get it to work. Otherwise, a variant of my code above would be to do zeInit(); putenv("ZES_ENABLE_SYSMAN=1"); zeInit(); and then the rest of the code. That's what going to happen if some library initializes L0 without Sysman before hwloc tries initializing with Sysman. Your commit seems to read ZES_ENABLE_SYSMAN multiple times from the environment. Do you know if all these readings will be made during the first zeInit()? If not, the application or some library may change the environment in the middle. Reading the environment only once and remembering isSysManEnabled might be more safe. |
You are correct. However, the patch I did was meant to just fix the segfault problem, so the application fails gracefully when SysMan hasn't been initialized. A complete solution for oneapi-src/level-zero#36 is still under discussion. |
I understand. What I am saying is that it would be better to look at ZES_ENABLE_SYSMAN only once and store the result in a variable so that all initialization everywhere is done consistently with respect to sysman being enabled or not. |
That's a good idea. As we move forward to have a more complete solution for oneapi-src/level-zero#36, we could add that. The patch above performs the checking of SYSMAN when the driver is being loaded by the loader, and at that moment there are no L0 objects created, so I preferred to avoid creating new objects for this checking, mainly considering that the final solution might change. |
@jandres742 Just tested the Debian packages released a couple days ago with your patch (20.51.18762), no crash anymore, thanks. |
@bgoglin Glad to hear that. For now, that's the error we are returning, but agree, it's not the more intuitive one. We should be actually returning "UNITIALIZED". So, in this first patch we took care of the segfault, but we need a second patch in the loader to return |
Closing this issue, the crash doesn't occur anymore, further discussion may go to oneapi-src/level-zero#36 |
This is an untested patch (don't know how to compile oneAPI) to provide a short-term workaround for segfaults reported in oneapi-src/level-zero#36
Signed-off-by: Brice Goglin [email protected]