-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: extensions: Add EGL_EXT_explicit_device #23
Conversation
Signed-off-by: Adam Jackson <[email protected]>
I'd say that this is more orthogonal to EGL_EXT_platform_device. When you create a display using EGL_EXT_platform_device, the EGL implementation doesn't pick a default windowing system, it runs without any windowing system at all. That means you can't use windows or pixmaps as drawables, because there's nowhere for a native window or pixmap to come from. But, with EGL_EXT_platform_device, you can use EGL to render to a pbuffer or an EGLStream, even on a headless system without X11 or Wayland. |
The surfaceless 'platform' also provides that in conjunction with this extension. I'd be much happier to push the combination of the two extensions as the canonical way to use device selection with a NULL platform; it splits the two up quite nicely. |
@nwnk do you want this accepted as is, or will you respond to the comments first? The usual guideline is that we only define things as EXT after multiple vendors agree to implement them, otherwise everything could be an EXT. |
@kbrenneman That isn't quite how I'd read EXT_platform_device - issue 1 therein makes it sound like the implementation may choose a surfaceful platform - but I suppose that's just my reading of it. Regardless, I still have a use for specifying both platform and device, and I agree with @fooishbar that this plus an appropriate platform extension is a more natural way to express intent than EXT_platform_device's "we'll give you a platform but you have to ask what it can do afterwards, better hope you like it" semantics. @oddhack I was sort of hoping that this PR would be the place where the multiple vendors get to that agreement. Is there a better venue for that kind of discussion, or a better way to mark PRs for EXT extensions as having reached consensus? I'd be happy to update the repo README to reflect the best practice here. (Similarly, though I have an implementation of this under the MESA_platform_device name, I'd prefer not to have to hang onto stale extension names forever. I suppose that's what MESAX is for though.) At any rate I'd prefer we not merge this if I can't get at least one other vendor to say they could at least support it in principle. |
@nwnk AFAICT, people who aren't members of KhronosGroup (e.g. most people) proposing PRs can't set Labels on them. So I think the best generally-useful advice may be to use "WIP:" on the title of the PR as a fairly obvious indicator that it is not yet resolved, like we do on internal Gitlab MRs (presuming the person who submits the PR can edit the title once that changes, which I hope they can). |
@nwnk - In addition to issue 1, this is the important part of the spec:
I guess in theory, an implementation could go through a windowing system under the hood, but if it did, that would be an internal detail, not visible to the application. |
The EGL_EXT_platform_device extension enables the client to create a | ||
display from a device, but does so by overloading the native_display | ||
argument to eglGetPlatformDisplay; the user passes the device as the | ||
native display, and presumably the EGL picks a sensible default for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an accurate description of EGL_EXT_platform_device. To start, the terms "platform" and "window system" are meant to be synonymous in EGL. As such, platform device explicitly replaces the need for a window system because it is itself a EGL "platform" or window system option for EGL, and it's not overloading the native_display argument any more than any other EGL platform would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think the comparison with EGL_EXT_platform_device is probably unwarranted then.
|
||
EGL_EXT_explicit_device differs by passing the device as an attribute | ||
to eglGetPlatformDisplay. This allows the client to name both the device | ||
and the platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is useful to solve a separate use case from that EGL_EXT_platform_device solves: PRIME/Optimus-style workloads where you want one GPU to power your compositor and another GPU to power a client app, with explicit application choice over the latter. However, there's no mechanism exposed here to query which devices are compatible with a given native display. It presumes support. I think you need an additional function or something to get a list of devices given a native display.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, exactly. Setting 'DRI_PRIME=1' in the environment is beyond lame, and given it's not tied to device queries, it precludes clients providing a sensible 'pick a GPU to run this on' dropdown box. And doesn't work with glvnd. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cubanismo Perhaps:
EGLBoolean eglQueryDisplayDevicesEXT(
EGLenum platform,
void *native_display,
EGLint max_devices,
EGLDeviceEXT *devices,
EGLint *num_devices
);
Where if platform
is EGL_PLATFORM_DEVICE_EXT
, native_display
must be null, and the returned device list is every device on which eglGetPlatformDisplay(EGL_PLATFORM_DEVICE_EXT)
might succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, every device that it would accept as an EGL_DEVICE_EXT attribute?
Note that this would require some additional support from GLVND, but it shouldn't be any harder than the way GLVND handles eglQueryDevicesEXT now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nwnk Yes, your eglQueryDisplayDevicesEXT() API sounds right. Might want to make it eglQueryNativeDisplayDevicesEXT() to avoid clashing with the various existing EGL display queries that operate on an EGLDisplay object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want some way for an application to ask, "What devices could I use with this display?" then you'd need an attribute list as well. With X11 for example, you might get different results on different screens, which are specified using an attribute. So, something like this could work:
EGLBoolean eglQueryNativeDisplayDevicesEXT(
EGLenum platform, void *native_display, const EGLAttrib *attrib_list,
EGLint max_devices, EGLDeviceEXT *devices, EGLint *num_devices);
Another option would be to use eglQueryDevicesEXT to get the device handles, and then add a function to check if a specific device can work with a native display:
EGLBoolean eglQueryDisplayDeviceCompatibleEXT(
EGLenum platform, void *native_display, const EGLAttrib *attrib_list,
EGLDeviceEXT device);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for an application to offer something like a drop-down menu for device selection, you'd also need to provide some sort of a persistent identifier for each device that the app could store in a config file. But, that would probably be outside the scope of this extension.
I had proposed a simple way to replace surfaceless with EGLDevice actually when surfaceless was first proposed, and Chad had agreed it was a viable path forward to a common solution, and said he would mark surfaceless deprecated once I got around to making such a change. I much prefer that direction over replacing EGL_EXT_platform_device with surfaceless. See further comments inline. I'm a little concerned that there doesn't appear to be a common understanding of how the basic EGLDevice extensions function after all the debate there's been over them. |
@oddhack FWIW, I'm a member of KhronosGroup, and I still can't set labels/assign myself to issues/etc. I'm not really sure what's up with the permissions of the Khronos repositories. |
@cubanismo (edit previous comment because I was looking at the wrong repo): I just added the Vulkan team with "Write" access to EGL-Registry. That may help. |
@cubanismo I assume you meant In this case, I don't think there's any misunderstanding of EGLDevice. It seems to work perfectly well for its primary usecases - headless rendering, or rendering to an EGLStream. But it doesn't work for using multiple devices with a winsys, because it's specified not to. So I'd like to see this landed, because it enables more usecases, allows us to jettison stupid hacks in Mesa (setting The reason I said that I'd prefer to advocate this extension in combination with surfaceless over |
@oddhack Yeah, looks right now. @fooishbar My comment about misunderstanding was in regards to the inaccurate claim in the extension that drivers need to guess a platform when using EGL_EXT_platform_device. I'd like this extension to go in too. I just have some concerns I noted inline that I think should be addressed first. |
@nwnk this will need conflicts resolved (probably just updating the extension # to 123, since another one went in first, and regenerate headers). Otherwise it's OK from my perspective but I don't think you've reached closure with @cubanismo yet? |
@oddhack indeed I haven't. trying to actually implement this in Mesa has revealed that this particular bit of Mesa is... not good. I'll circle back to this once I have the proof of concept actually working, which hopefully won't be too much longer. |
@cubanismo why do we need a From the proposed spec:
Thus the client can trivially iterate over the device/platform combination it consider acceptable. For example: On the other hand, the extra query API has the following downsides:
Can you please shed some light? |
Guess-and-check is not a valid form of enumeration. Properly written EGL applications should never expect to receive an EGL error. This design principal is implied by the existing EGL specification. There's always a better way to query some capability other than error codes. Responding to your downsides,
|
It seems like we're more on the theoretical/ideological side instead of the practical one. As much as I would love to agree, I believe the ship has sailed. Robust applications do check for EGL errors and others crash. In some cases error checking is even used (but not solely) to establish if extension X exists. See the Conformance tests: 1 note in Let's ignore all that for a moment: even with On the technical side: having a sane
So even if we fake it, app still needs to fallback to another device. Which effectively makes the API misleading to say at the very least. |
What about a single-device query? Basically a way to check whether a specific EGLDeviceEXT handle is usable with a particular display. If you can implement an error code in eglGetPlatformDisplay, then you can also implement such a query function, even if all it does is call eglGetPlatfromDisplay internally:
That's enough to get the correct semantics, but it also gives a driver the option to do something more efficient. Also, a single-device query would work a lot better with libglvnd. A vendor library can provide a dispatch function for eglQueryDisplayDeviceSupportedEXT, but eglQueryDisplayDevicesEXT would require an updated version of libglvnd. |
Also, a device becoming unusable is a problem regardless of whether you have a separate query function or not. Yes, a device might disappear between the query and the eglGetPlatformDisplay calls, but it could just as easily disappear after calling eglGetPlatformDisplay, and the vendor will have to deal with that somehow. |
How applications use the spec is their business. The spec should be designed in a way that experienced programmers can infer details of its behavior without reading every letter of the spec. Dismissing spec design principles as ideological or theoretical factors isn't reasonable. Yes, reality of underlying HW and systems behavior trumps all, but I don't see that being the case here. Yes, single device query may be reasonable compromise if we can't do better, but I noted above it probably doesn't match what applications actually want to do, and it still requires enumerating devices somehow first. No way around an enumeration function somewhere to get an EGLDevice handle. However, I don't follow why an "all supported devices" query isn't feasible. Vulkan has to load all drivers to enumerate its devices, OpenCL is the same. There are open source Vulkan and OpenCL implementations, so why is it suddenly a problem with EGL? I would hope its possible to check a device's capabilities without actually turning it on, but I don't know what types of queries in general are needed on the various drivers to determine support. |
An all-devices query is feasible, but I think a single-device query would be cleaner. If you can implement a single-device query, then you could implement an all-devices query by calling eglQueryDevicesEXT and then passing each EGLDeviceEXT handle to the single-device query. But, I think single-device is cleaner because it keeps enumerating devices and checking for support as separate functions instead of effectively duplicating the enumeration step. It also gives better flexibility, because it gives an app the option to filter out some devices before checking support if it wants. But, if all an app wants is an all-devices query, then that's still easy to do. And, as I noted, a single-device query doesn't require any changes to libglvnd, so you could add the extension with just an updated driver. |
Today, in Mesa, OpenCL is partially supported on 1 vendor on very small HW range, with limited users and apps that do silly things. Vulkan has 2 fairly mature drivers. But in either case - most such applications request one device and use that. AFAICT they rarely iterate over all devices, let alone eglInitialize/eglTerminate dozen or so times before actually drawing something on the screen. As you said I see your goal here and calling it ideological was inappropriate, yet I'm looking for a solution that is more practical and one that will not cause similar problems we had in the past. Even if that means it's not the perfect solution. |
Isn't the whole point of this extension to let an app enumerate available devices and pick one to use for rendering? If an app doesn't want to go through the device list, then it's just going to call eglGetPlatformDisplay without a device and let the EGL implementation deal with the rest. |
Yes. And you don't need new API ( |
That's my point. If an app is just going to call If the app does want to enumerate and pick a device, then, and only then, it will need to use this extension. In that case, it will need to know which devices will work with whatever display it needs to connect to. |
I think you're getting confused here. The extension adds support for passing the device handle into We seems to be going in circles. Perhaps we should have a chat about this at the next EGL WG call? |
That is not an assumption we can make. Let's consider the following possible implementations:
So if we avoid the Query API this gets us out of the tricky situation, were we have to consider between a) keep people's GPU on all the time or b) imposing a significant time delay. If Mesa was a single vendor driver, we could expose some magic (via sysfs or otherwise) to indicate the capability of working with X11/Wayland/etc, and open(wake up) the GPU only at eglInitialize... yet that's not the case. |
I'm not suggesting that the app should be required to call the query function -- calling eglGetPlatformDisplay or eglInitialize still has to be able to indicate success or failure. So, if an application is just going to pick a device to use based on its own logic (instead of giving the user a list to choose from), then it would just go directly to eglGetPlatformDisplay/eglInitialize, and then be ready to deal with a failure. The case I'm thinking about for a query is if the app wants to get a list of available devices and present that list to the user to choose one. In that case, the app has to make a list of devices that will work. Without a query function, the only way to do that is to call eglGetPlatformDisplay, eglInitialize, and eglTerminate for each device. With a query function, at worst the app would have the same performance cost to what it would have to do without the query. |
@kbrenneman the use-case of the API was rather clear since the original posting. W/o the Query API the user can freely, Even if we add the
The eglQuery call itself will take the 2x. So we will result in 4x overall
|
If an app wants to give the user a list to choose from, then the app doesn't know which device is "of interest." Given a known native display, and multiple possible devices to choose from, how should an app create a list of devices so that it can present that list and let a user choose one? |
Based on your earlier example, here is a snippet. EGLDeviceEXT devices[max_count];
EGLBoolean eglQueryDevicesEXT(max_count, devices, &num);
for (i=0; i<num; i++) {
if (device_is_acceptable(devices[i])) {
EGLAttrib attribs[] = {
EGL_DEVICE_EXT, devices[i],
/* Other attributes */
EGL_NONE
};
EGLDisplay display = eglGetPlatformDisplay(platform, native_dpy, attribs);
if (display && eglInitialize(display, NULL, NULL)) {
/* Get extra information */
add_to_ui(devices[i]);
}
eglTerminate(display);
}
}
// Let the user pick a device, then try to use it There is a hidden extra advantage here. As hinted in the extra comment (and mentioned at XDC) I have a WIP EGL extension based on GLX_MESA_query_renderer. Obviously any other similar extension could be used or introduced. |
Given the response in #23 (comment), part of the disconnect for me was that I didn't understand why eglQueryDevicesEXT() would presumably be so much faster than the hypothetical eglQueryDisplayDevicesEXT() since both require loading all drivers. I see now that you're specifically talking about initializing something in the kernel drivers (open() some device node) being the slow part. Our implementation initializes the relevant driver state as part of eglQueryDevicesEXT() and holds that reference open for the lifetime of the app, so there's essentially zero init cost to further queries against those devices. Even if we assume drivers can't improve their implementations, the overhead in your math above is exaggerated as it is a worst-case scenario where you only consider one device, which isn't meaningful since if there's only one device there's little point in querying compatibility. If you look at the more meaningful case you suggested, where there are 6 different devices, you'd end up with an overall throughput of 12/14 = 85% using the measurements you described above. Worst case, 2 devices, you'd get 4/6 = ~67% with no optimization. And remember this is all optional. If the app needs max performance, it can rely on error codes instead. That's a design choice applications should make, not driver APIs. |
eglQueryDevicesEXT() does not load any drivers, it lists drm (and software) devices. As per the EGL API, the implementation could assume, yet is not guaranteed that any device, platform, etc will work until eglInitialize() returns EGL_TRUE.
It sounds like the GPU will be powered on, burning power, even if nobody uses it. I can see the reason behind it, but it's not something open-source kernel drivers are keen on doing.
There are plenty of cases of 2+ GPUs (from a single vendor or not), using solely Mesa. Even then, it's the developer decision whether to query the one, two or however devices they have around - little point or not.
In light of the eglQueryDevicesEXT() clarification, I believe these numbers are a bit off. Let me put this question: What would your reaction would be, if we propose an API that makes the Nvidia implementation X times worse, regardless which numbers we consider? Now let's remind ourselves this change has an impact on all of Mesa, which covers hardware from 6+ vendors. Obviously the severity of impact depends on the vendor HW, kernel driver, etc. |
If we assume the worst-case scenario:
Given that, if an app queries each device and then picks and initializes one, then it will take time for three (eglGetPlatformDisplay, eglInitialize, eglTerminate) sequences. If the app doesn't use the query, and instead calls eglInitialize directly on each device, then it goes through two (eglGetPlatformDisplay, eglInitialize, eglTerminate) sequences. Based on that, the absolute worst-case performance of using the query is 2/3 speed. And remember, an application doesn't have to use the query function -- it can use either of those methods, at the app developer's discretion. Now conversely, if you have a query function, then an implementation might not need to take as long as the full (eglGetPlatformDisplay, eglInitialize, eglTerminate) sequence. For instance, a driver might do offscreen rendering (like it would with EGL_EXT_platform_device) and use some common presentation path to copy the image to the screen, in which case it wouldn't need any device-specific checking at all. In that case, the query function could be as fast as calling XQueryExtension, but an eglInitialize call could still be the slow operation you've described. Not having a query function means that you are asserting, at the API level, that a driver cannot do anything more efficient to check for device or platform support. |
Resolve conflicts and address requested changes from Cubanismo and we can proceed. |
No, initializing a device enough to query its capabilities in our internal driver API doesn't necessarily mean increasing the power draw of our GPUs.
I've been pretty consistent about pushing for designs that are right, rather than designs that are right for NVIDIA's current SW stack. Regardless, I'm not trying to make this an NV Vs. not-NV drivers debate. When I share observations about how our driver behaves, it's just to note internal driver interfaces are not immutable, and there exist other implementation options than what NVIDIA, DRM, DRI, or Mesa do right now. GLX has seen 3 major iterations of the DRI interfaces, probably as many or more significant changes to the DRM interfaces, and I can't even speculate how many major changes to Mesa internals without any major changes to its application-facing API. Point being, application-facing interfaces tend to outlive internal driver interfaces, and hence their design should be driven by feasibility, maximal use of HW across the relevant vendors, and meeting the needs of application developers, not current limitations of driver internals. |
@nwnk checking in on this extension - did it ever ship? Do you still intend to resolve remaining issues and get it registered? |
The question of how an application figures out which devices can support a display is still open. Now that I think about it, returning an error code for an unsupported (EGLDeviceEXT, native_display) pair isn't far from what can happen even now. An driver can support a platform extension, but still not support a specific native display. For instance, a driver that supports EGL_KHR_platform_x11 might still fail if you handed it a remote X server. If we go with an error code, then there are a couple details we'd need to nail down: First, should eglGetPlatformDisplay always check if the (device, display) pair is supported, or can an implementation defer that check to eglInitialize? Allowing a check an eglGetPlatformDisplay is probably unavoidable. Even if a driver would defer the check to eglInitialize, with libglvnd, eglGetPlatformDisplay could still fail if it gets an EGLDevice from a vendor doesn't support EXT_EXT_explicit_device. But, if checking for support requires expensive initialization, then we'd probably want to wait until eglInitialize, since that's when the driver would need to do that initialization anyway. Second, what error code should it hand back if the device and display are otherwise valid, but the driver doesn't support the combination? Would there be any use for an application to distinguish this case from other failures? |
Talked offline with @evelikov about the practicalities of this. Aside from the ambiguities @kbrenneman is attempting to resolve in his last comment, I'm fine with a guess-and-check approach at this point. To the last comment, it is kind of unfortunate to have two different failure paths for this scenario. I be naively apps would want to fail GetDisplay if the device + native display were incompatible, but I agree that might run into the same issues @evelikov is trying to avoid in the first place. It probably just depends on what drivers actually do as far as initialization in GetDisplay(). I haven't yet looked into how this is currently divided up in our driver. |
A driver might be able to figure out in advance whether a device can work, depending on how it actually gets the image data to a drawable. If you know that you'll always use DRI3+Present to display an image, for example, then it might be enough to know that those extensions are supported. |
If we want to have an incompatible device/native_display only fail in one place, and eglGetPlatformDisplay isn't an option, then I think it might actually be possible for libglvnd to provide a "successful" call to eglGetPlatformDisplay and a failure in eglInitialize, even for a vendor that doesn't support EGL_EXT_explicit_device. If we added an internal dummy vendor that just fails eglInitialize (and similar stubs for everything else), then libglvnd could create an EGLDisplay handle using that. I'm not sure if that would be cleaner than just allowing either eglGetPlatformDisplay or eglInitialize to fail, though. |
One downside I thought of for deferring failure to eglInitialize() is that at this point the app has an EGLDisplay object. Currently, there's no way to destroy an EGLDisplay object, so any resources associated with it will essentially be leaked for the life of the application. If we could fail upfront in eglGetPlatformDisplay(), we could avoid that. Of course, there are extensions proposed to fully destroy display handles as well. |
That's true. It would also be make things more complicated if we wanted to use this extension in libglvnd for setting up automatic GPU offloading. It would be a pretty clean design to let libglvnd pick an EGLDeviceEXT handle based on some user configuration and add it to the attribute list in eglGetPlatformDisplay. If a failure due to a mismatch always happens in the vendor's eglGetPlatformDisplay call, then it's easy to fall back to the default (or a different device) if the chosen device doesn't work. If the failure could be deferred until eglInitialize, then that's harder, because libglvnd would have to call eglInitialize/eglTerminate to check if the display actually works. |
@evelikov, any further thoughts here? It sounds like from a design perspective, GetPlatformDisplay is the preferable failure point, but since the design revolves around what's feasible in Mesa, we need to know if that's possible without too much work. I suppose this boils down to whether gallium/DRI caps can be queried at this point in Mesa's init process? |
Friendly poke to see if you want to resolve or close this. |
I think this extension is still useful. We just need to nail down the specific error behavior between eglGetPlatformDisplay and eglInitialize. I think it would be cleaner to require eglGetPlatformDisplay to check for compatibility between the EGLDeviceEXT and the native display, if that's something that Mesa can reasonably support. |
@kbrenneman We can handle that, yeah. From the Mesa side we haven't really pushed it forward, because it turned out to be easier (as well as a lot better) to work on deleting the giant midlayer that made it difficult to pass through device + display explicitly, than to type up support for it. |
I wrote up a new version of the spec based on the discussion here, and posted it in #157. |
Now that #157 is checked in, I think we can close this one. |
If you want to abandon, I think you can close it. If not I can. |
I can't close it, because nwnk opened it originally. |
Abandon. #157 merged. |
Make GL_EXT_direct_state_access available in core profile
This adds EGL_DEVICE_EXT as a valid attribute to eglGetPlatformDisplay, to allow the implementation to name both the platform and device explicitly. I'm submitting this as EXT as I think it's strictly superior to EGL_EXT_platform_device and hope non-Mesa vendors adopt it as well.