Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: extensions: Add EGL_EXT_explicit_device #23

Closed
wants to merge 1 commit into from

Conversation

nwnk
Copy link
Contributor

@nwnk nwnk commented Aug 8, 2017

This adds EGL_DEVICE_EXT as a valid attribute to eglGetPlatformDisplay, to allow the implementation to name both the platform and device explicitly. I'm submitting this as EXT as I think it's strictly superior to EGL_EXT_platform_device and hope non-Mesa vendors adopt it as well.

@kbrenneman
Copy link
Contributor

I'd say that this is more orthogonal to EGL_EXT_platform_device.

When you create a display using EGL_EXT_platform_device, the EGL implementation doesn't pick a default windowing system, it runs without any windowing system at all. That means you can't use windows or pixmaps as drawables, because there's nowhere for a native window or pixmap to come from.

But, with EGL_EXT_platform_device, you can use EGL to render to a pbuffer or an EGLStream, even on a headless system without X11 or Wayland.

@fooishbar
Copy link
Contributor

The surfaceless 'platform' also provides that in conjunction with this extension. I'd be much happier to push the combination of the two extensions as the canonical way to use device selection with a NULL platform; it splits the two up quite nicely.

@oddhack
Copy link
Contributor

oddhack commented Aug 8, 2017

@nwnk do you want this accepted as is, or will you respond to the comments first? The usual guideline is that we only define things as EXT after multiple vendors agree to implement them, otherwise everything could be an EXT.

@nwnk
Copy link
Contributor Author

nwnk commented Aug 8, 2017

@kbrenneman That isn't quite how I'd read EXT_platform_device - issue 1 therein makes it sound like the implementation may choose a surfaceful platform - but I suppose that's just my reading of it. Regardless, I still have a use for specifying both platform and device, and I agree with @fooishbar that this plus an appropriate platform extension is a more natural way to express intent than EXT_platform_device's "we'll give you a platform but you have to ask what it can do afterwards, better hope you like it" semantics.

@oddhack I was sort of hoping that this PR would be the place where the multiple vendors get to that agreement. Is there a better venue for that kind of discussion, or a better way to mark PRs for EXT extensions as having reached consensus? I'd be happy to update the repo README to reflect the best practice here. (Similarly, though I have an implementation of this under the MESA_platform_device name, I'd prefer not to have to hang onto stale extension names forever. I suppose that's what MESAX is for though.)

At any rate I'd prefer we not merge this if I can't get at least one other vendor to say they could at least support it in principle.

@oddhack
Copy link
Contributor

oddhack commented Aug 8, 2017

@nwnk AFAICT, people who aren't members of KhronosGroup (e.g. most people) proposing PRs can't set Labels on them. So I think the best generally-useful advice may be to use "WIP:" on the title of the PR as a fairly obvious indicator that it is not yet resolved, like we do on internal Gitlab MRs (presuming the person who submits the PR can edit the title once that changes, which I hope they can).

@nwnk nwnk changed the title extensions: Add EGL_EXT_explicit_device WIP: extensions: Add EGL_EXT_explicit_device Aug 8, 2017
@kbrenneman
Copy link
Contributor

@nwnk - In addition to issue 1, this is the important part of the spec:

Add the following sentence to the end of the second paragraph after
the eglCreatePlatformWindowSurface prototype.

    "There are no valid values of <native_window> when <dpy> belongs
    to the EGL_PLATFORM_DEVICE_EXT platform."

Add the following sentence to the end of the second paragraph after
the eglCreatePlatformPixmapSurface prototype.

    "There are no valid values of <native_pixmap> when <dpy> belongs
    to the EGL_PLATFORM_DEVICE_EXT platform.

I guess in theory, an implementation could go through a windowing system under the hood, but if it did, that would be an internal detail, not visible to the application.

The EGL_EXT_platform_device extension enables the client to create a
display from a device, but does so by overloading the native_display
argument to eglGetPlatformDisplay; the user passes the device as the
native display, and presumably the EGL picks a sensible default for the
Copy link
Contributor

@cubanismo cubanismo Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an accurate description of EGL_EXT_platform_device. To start, the terms "platform" and "window system" are meant to be synonymous in EGL. As such, platform device explicitly replaces the need for a window system because it is itself a EGL "platform" or window system option for EGL, and it's not overloading the native_display argument any more than any other EGL platform would.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think the comparison with EGL_EXT_platform_device is probably unwarranted then.


EGL_EXT_explicit_device differs by passing the device as an attribute
to eglGetPlatformDisplay. This allows the client to name both the device
and the platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is useful to solve a separate use case from that EGL_EXT_platform_device solves: PRIME/Optimus-style workloads where you want one GPU to power your compositor and another GPU to power a client app, with explicit application choice over the latter. However, there's no mechanism exposed here to query which devices are compatible with a given native display. It presumes support. I think you need an additional function or something to get a list of devices given a native display.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, exactly. Setting 'DRI_PRIME=1' in the environment is beyond lame, and given it's not tied to device queries, it precludes clients providing a sensible 'pick a GPU to run this on' dropdown box. And doesn't work with glvnd. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cubanismo Perhaps:

EGLBoolean eglQueryDisplayDevicesEXT(
        EGLenum platform,
        void *native_display,
        EGLint max_devices,
        EGLDeviceEXT *devices,
        EGLint *num_devices
);

Where if platform is EGL_PLATFORM_DEVICE_EXT, native_display must be null, and the returned device list is every device on which eglGetPlatformDisplay(EGL_PLATFORM_DEVICE_EXT) might succeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, every device that it would accept as an EGL_DEVICE_EXT attribute?

Note that this would require some additional support from GLVND, but it shouldn't be any harder than the way GLVND handles eglQueryDevicesEXT now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nwnk Yes, your eglQueryDisplayDevicesEXT() API sounds right. Might want to make it eglQueryNativeDisplayDevicesEXT() to avoid clashing with the various existing EGL display queries that operate on an EGLDisplay object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want some way for an application to ask, "What devices could I use with this display?" then you'd need an attribute list as well. With X11 for example, you might get different results on different screens, which are specified using an attribute. So, something like this could work:

EGLBoolean eglQueryNativeDisplayDevicesEXT(
        EGLenum platform, void *native_display, const EGLAttrib *attrib_list,
        EGLint max_devices, EGLDeviceEXT *devices, EGLint *num_devices);

Another option would be to use eglQueryDevicesEXT to get the device handles, and then add a function to check if a specific device can work with a native display:

EGLBoolean eglQueryDisplayDeviceCompatibleEXT(
        EGLenum platform, void *native_display, const EGLAttrib *attrib_list,
        EGLDeviceEXT device);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for an application to offer something like a drop-down menu for device selection, you'd also need to provide some sort of a persistent identifier for each device that the app could store in a config file. But, that would probably be outside the scope of this extension.

@cubanismo
Copy link
Contributor

cubanismo commented Aug 8, 2017

I had proposed a simple way to replace surfaceless with EGLDevice actually when surfaceless was first proposed, and Chad had agreed it was a viable path forward to a common solution, and said he would mark surfaceless deprecated once I got around to making such a change. I much prefer that direction over replacing EGL_EXT_platform_device with surfaceless. See further comments inline. I'm a little concerned that there doesn't appear to be a common understanding of how the basic EGLDevice extensions function after all the debate there's been over them.

@cubanismo
Copy link
Contributor

@oddhack FWIW, I'm a member of KhronosGroup, and I still can't set labels/assign myself to issues/etc. I'm not really sure what's up with the permissions of the Khronos repositories.

@oddhack
Copy link
Contributor

oddhack commented Aug 8, 2017

@cubanismo (edit previous comment because I was looking at the wrong repo): I just added the Vulkan team with "Write" access to EGL-Registry. That may help.

@fooishbar
Copy link
Contributor

@cubanismo I assume you meant EGL_EXT_platform_device instead of platform_display above?

In this case, I don't think there's any misunderstanding of EGLDevice. It seems to work perfectly well for its primary usecases - headless rendering, or rendering to an EGLStream. But it doesn't work for using multiple devices with a winsys, because it's specified not to. So I'd like to see this landed, because it enables more usecases, allows us to jettison stupid hacks in Mesa (setting DRI_PRIME in the environment), allows cleaner client configuration, and also works cross-device with GLVND.

The reason I said that I'd prefer to advocate this extension in combination with surfaceless over EGL_EXT_platform_device is because it seems like a cleaner separation of concerns. Device enumeration and selection in one extension, and the notion of using that device to do headless rendering in another extension. Right now the two are conflated, which is a bit awkward: surfaceless is quite clear in its scope, whereas (for me; YMMV) it requires tilting your brain 90° in order to see a rendering device as a platform. But that's just an aside, and I'll stop derailing discussion of this otherwise orthogonal extension.

@cubanismo
Copy link
Contributor

@oddhack Yeah, looks right now.

@fooishbar My comment about misunderstanding was in regards to the inaccurate claim in the extension that drivers need to guess a platform when using EGL_EXT_platform_device.

I'd like this extension to go in too. I just have some concerns I noted inline that I think should be addressed first.

@oddhack
Copy link
Contributor

oddhack commented Oct 5, 2017

@nwnk this will need conflicts resolved (probably just updating the extension # to 123, since another one went in first, and regenerate headers). Otherwise it's OK from my perspective but I don't think you've reached closure with @cubanismo yet?

@nwnk
Copy link
Contributor Author

nwnk commented Oct 5, 2017

@oddhack indeed I haven't. trying to actually implement this in Mesa has revealed that this particular bit of Mesa is... not good. I'll circle back to this once I have the proof of concept actually working, which hopefully won't be too much longer.

@evelikov
Copy link

evelikov commented Oct 3, 2018

@cubanismo why do we need a eglQueryDisplayDevicesEXT() like API for the PRIME/Optimus-style workloads?

From the proposed spec:

If the device and platform are not compatible for any reason, EGL_BAD_MATCH is generated

Thus the client can trivially iterate over the device/platform combination it consider acceptable. For example:
a) select a platform (in order of preference) that works with, say a Nvidia GPU
b) select a device (in order or preference) that works with, say X - more or less your PRIME/Optimus-style case

On the other hand, the extra query API has the following downsides:

  • bring up (and teardown) a connection to said platform, for every driver, across all devices
  • does not consider attributes passed eglGetPlatformDisplayEXT such as EGL_PLATFORM_X11_SCREEN_EXT and the complexity involved; driver and potentially application side.

Can you please shed some light?

@cubanismo
Copy link
Contributor

Guess-and-check is not a valid form of enumeration. Properly written EGL applications should never expect to receive an EGL error. This design principal is implied by the existing EGL specification. There's always a better way to query some capability other than error codes.

Responding to your downsides,

  • I don't think every platform need be queried. Presumably the application either knows which platform it wants to talk to, or which GPU it wants to use. If it knows neither, this extension isn't going to gain them anything anyway. If it knows which EGLDevice it wants and no other one will do, it's free to attempt it and handle the error as you suggest. There's no requirement it use the query. However, I don't see a way around querying the drivers from the platform backend (which may be EGL itself) to determine which GPUs support a given platform. The query could be refactored to ask if a specific platform supports a specific GPU if the concern is loading all the drivers, but I suspect applications would prefer just getting a list of all devices. Note some portion of the stack must enumerate and connect to all drivers somewhere with either usage model: Either you do so when querying a list of all devices in order to pass one in to this extension (This is implemented in GLVND at the moment), or you query a list of devices from a platform, which in turn may connect to all drivers, or may have a shortcut in that it knows it only supports devices from a specific driver. Regardless, the application needs to discover an EGLDevice handle somehow to use this extension. The handles are opaque, so it can't just hard-code one and expect it to work portably.

  • That the query as proposed doesn't currently accept attributes is valid feedback, and should be addressed. Off the top of my head, I would think it should likely be defined to accept a strict superset of the attributes accepted by eglGetPlatformDisplay, with identical semantics for those which are shared.

@evelikov
Copy link

evelikov commented Oct 9, 2018

It seems like we're more on the theoretical/ideological side instead of the practical one.

As much as I would love to agree, I believe the ship has sailed. Robust applications do check for EGL errors and others crash. In some cases error checking is even used (but not solely) to establish if extension X exists. See the Conformance tests: 1 note in EGL_EXT_client_extensions and the EGL_BAD_PARAMETER of eglGetPlatformDisplayEXT() - there are many applications that depend on those.

Let's ignore all that for a moment: even with eglQueryDisplayDevicesEXT() apps should check eglGetPlatformDisplayEXT() since there's no guarantee the GPU did not become inaccessible - GPU lockup, plugged out, other.

On the technical side: having a sane eglQueryDisplayDevicesEXT() implementation in an [open-source] driver, that supports hardware from 6+ vendors is impossible:

  • Attempt each driver to check the requirements - Huge delays as kernel dynamic PM kicks. We had ~2s in the past...
  • Fake it: return "every device works on this display", even when it clearly a lie - driver X is missing a dependency of kernel module Y, library Z, etc.

So even if we fake it, app still needs to fallback to another device. Which effectively makes the API misleading to say at the very least.

@kbrenneman
Copy link
Contributor

What about a single-device query? Basically a way to check whether a specific EGLDeviceEXT handle is usable with a particular display.

If you can implement an error code in eglGetPlatformDisplay, then you can also implement such a query function, even if all it does is call eglGetPlatfromDisplay internally:

EGLBoolean eglQueryDisplayDeviceSupportedEXT(EGLenum platform, void *native_display,
        EGLDeviceEXT dev, const EGLAttrib *attrib_list)
{
    EGLAttrib *full_attrib_list = (append dev to attrib_list);
    EGLDisplay edpy = eglGetPlatformDisplay(platform, native_display,
            full_attrib_list);
    return (edpy != EGL_NO_DISPLAY);
}

That's enough to get the correct semantics, but it also gives a driver the option to do something more efficient.

Also, a single-device query would work a lot better with libglvnd. A vendor library can provide a dispatch function for eglQueryDisplayDeviceSupportedEXT, but eglQueryDisplayDevicesEXT would require an updated version of libglvnd.

@kbrenneman
Copy link
Contributor

Also, a device becoming unusable is a problem regardless of whether you have a separate query function or not. Yes, a device might disappear between the query and the eglGetPlatformDisplay calls, but it could just as easily disappear after calling eglGetPlatformDisplay, and the vendor will have to deal with that somehow.

@cubanismo
Copy link
Contributor

How applications use the spec is their business. The spec should be designed in a way that experienced programmers can infer details of its behavior without reading every letter of the spec. Dismissing spec design principles as ideological or theoretical factors isn't reasonable. Yes, reality of underlying HW and systems behavior trumps all, but I don't see that being the case here.

Yes, single device query may be reasonable compromise if we can't do better, but I noted above it probably doesn't match what applications actually want to do, and it still requires enumerating devices somehow first. No way around an enumeration function somewhere to get an EGLDevice handle.

However, I don't follow why an "all supported devices" query isn't feasible. Vulkan has to load all drivers to enumerate its devices, OpenCL is the same. There are open source Vulkan and OpenCL implementations, so why is it suddenly a problem with EGL? I would hope its possible to check a device's capabilities without actually turning it on, but I don't know what types of queries in general are needed on the various drivers to determine support.

@kbrenneman
Copy link
Contributor

An all-devices query is feasible, but I think a single-device query would be cleaner.

If you can implement a single-device query, then you could implement an all-devices query by calling eglQueryDevicesEXT and then passing each EGLDeviceEXT handle to the single-device query.

But, I think single-device is cleaner because it keeps enumerating devices and checking for support as separate functions instead of effectively duplicating the enumeration step.

It also gives better flexibility, because it gives an app the option to filter out some devices before checking support if it wants. But, if all an app wants is an all-devices query, then that's still easy to do.

And, as I noted, a single-device query doesn't require any changes to libglvnd, so you could add the extension with just an updated driver.

@evelikov
Copy link

evelikov commented Oct 9, 2018

Today, in Mesa, OpenCL is partially supported on 1 vendor on very small HW range, with limited users and apps that do silly things. Vulkan has 2 fairly mature drivers. But in either case - most such applications request one device and use that.

AFAICT they rarely iterate over all devices, let alone eglInitialize/eglTerminate dozen or so times before actually drawing something on the screen. As you said How applications use the spec is their business. - there's nothing forbidding such behaviour so they do it. Unluckily for us there was/is just enough delay in-between for the dynPM to kick in, a number of times.

I see your goal here and calling it ideological was inappropriate, yet I'm looking for a solution that is more practical and one that will not cause similar problems we had in the past. Even if that means it's not the perfect solution.

@kbrenneman
Copy link
Contributor

Isn't the whole point of this extension to let an app enumerate available devices and pick one to use for rendering?

If an app doesn't want to go through the device list, then it's just going to call eglGetPlatformDisplay without a device and let the EGL implementation deal with the rest.

@evelikov
Copy link

Isn't the whole point of this extension to let an app enumerate available devices and pick one to use for rendering?

Yes. And you don't need new API (eglQueryDisplayDeviceSupportedEXT() or otherwise) to tell you exactly the same thing that you'll get from eglGetPlatformDisplay().

@kbrenneman
Copy link
Contributor

Yes. And you don't need new API (eglQueryDisplayDeviceSupportedEXT() or otherwise) to tell you exactly the same thing that you'll get from eglGetPlatformDisplay().

That's my point. If an app is just going to call eglGetPlatformDisplay(), then this extension is irrelevant, because the app isn't going to use it.

If the app does want to enumerate and pick a device, then, and only then, it will need to use this extension. In that case, it will need to know which devices will work with whatever display it needs to connect to.

@evelikov
Copy link

evelikov commented Oct 10, 2018

Yes. And you don't need new API (eglQueryDisplayDeviceSupportedEXT() or otherwise) to tell you exactly the same thing that you'll get from eglGetPlatformDisplay().

That's my point. If an app is just going to call eglGetPlatformDisplay(), then this extension is irrelevant, because the app isn't going to use it.

I think you're getting confused here. The extension adds support for passing the device handle into eglGetPlatformDisplay().

We seems to be going in circles. Perhaps we should have a chat about this at the next EGL WG call?

@evelikov
Copy link

Presumably, once you've created an EGLDisplay handle, a subsequent eglGetPlatformDisplay call shouldn't take that full amount of time, right?

That is not an assumption we can make.

Let's consider the following possible implementations:

  • Query opens/wakes up corresponding GPUs and we close() on eglTerminate. Unused devices will not be closed, always burning power.
  • Query open/close the device, eglInitialize reopens it.. Users will experience ~3x the original delay.

So if we avoid the Query API this gets us out of the tricky situation, were we have to consider between a) keep people's GPU on all the time or b) imposing a significant time delay.

If Mesa was a single vendor driver, we could expose some magic (via sysfs or otherwise) to indicate the capability of working with X11/Wayland/etc, and open(wake up) the GPU only at eglInitialize... yet that's not the case.

@kbrenneman
Copy link
Contributor

kbrenneman commented Oct 21, 2018

I'm not suggesting that the app should be required to call the query function -- calling eglGetPlatformDisplay or eglInitialize still has to be able to indicate success or failure. So, if an application is just going to pick a device to use based on its own logic (instead of giving the user a list to choose from), then it would just go directly to eglGetPlatformDisplay/eglInitialize, and then be ready to deal with a failure.

The case I'm thinking about for a query is if the app wants to get a list of available devices and present that list to the user to choose one. In that case, the app has to make a list of devices that will work. Without a query function, the only way to do that is to call eglGetPlatformDisplay, eglInitialize, and eglTerminate for each device. With a query function, at worst the app would have the same performance cost to what it would have to do without the query.

@evelikov
Copy link

@kbrenneman the use-case of the API was rather clear since the original posting.
The point is that there's a serious asymmetry of the API and state that needs to be handled.

W/o the Query API the user can freely, eglGetPlatformDisplay for each device, and only eglInitialize the ones of interest to the developer/user.

Even if we add the eglTerminate, from Mesa's POV, we can consider that 2x.

  • eglGetPlatformDisplay(1x), eglInitialize, eglTerminate(1x)

The eglQuery call itself will take the 2x. So we will result in 4x overall

  • eglQuery(2x), eglGetPlatformDisplay(1x), eglInitialize, eglTerminate(1x)

@kbrenneman
Copy link
Contributor

If an app wants to give the user a list to choose from, then the app doesn't know which device is "of interest."

Given a known native display, and multiple possible devices to choose from, how should an app create a list of devices so that it can present that list and let a user choose one?

@evelikov
Copy link

Based on your earlier example, here is a snippet.

EGLDeviceEXT devices[max_count];
EGLBoolean eglQueryDevicesEXT(max_count, devices, &num);
for (i=0; i<num; i++) {
    if (device_is_acceptable(devices[i])) {
        EGLAttrib attribs[] = {
            EGL_DEVICE_EXT, devices[i],
            /* Other attributes */
            EGL_NONE
        };

        EGLDisplay display = eglGetPlatformDisplay(platform, native_dpy, attribs);
        if (display && eglInitialize(display, NULL, NULL)) {
            /* Get extra information */
            add_to_ui(devices[i]);
        }
        eglTerminate(display);
    }
}
// Let the user pick a device, then try to use it

There is a hidden extra advantage here. As hinted in the extra comment (and mentioned at XDC) I have a WIP EGL extension based on GLX_MESA_query_renderer.
It operated on the initialized EGLDisplay and provides maximum supported GL(core, compat) GLES2, VRAM, vendor/device PCI ID, etc. - a collection of useful early stage information.

Obviously any other similar extension could be used or introduced.

@cubanismo
Copy link
Contributor

Given the response in #23 (comment), part of the disconnect for me was that I didn't understand why eglQueryDevicesEXT() would presumably be so much faster than the hypothetical eglQueryDisplayDevicesEXT() since both require loading all drivers. I see now that you're specifically talking about initializing something in the kernel drivers (open() some device node) being the slow part. Our implementation initializes the relevant driver state as part of eglQueryDevicesEXT() and holds that reference open for the lifetime of the app, so there's essentially zero init cost to further queries against those devices.

Even if we assume drivers can't improve their implementations, the overhead in your math above is exaggerated as it is a worst-case scenario where you only consider one device, which isn't meaningful since if there's only one device there's little point in querying compatibility. If you look at the more meaningful case you suggested, where there are 6 different devices, you'd end up with an overall throughput of 12/14 = 85% using the measurements you described above. Worst case, 2 devices, you'd get 4/6 = ~67% with no optimization. And remember this is all optional. If the app needs max performance, it can rely on error codes instead. That's a design choice applications should make, not driver APIs.

@evelikov
Copy link

Given the response in #23 (comment), part of the disconnect for me was that I didn't understand why eglQueryDevicesEXT() would presumably be so much faster than the hypothetical eglQueryDisplayDevicesEXT() since both require loading all drivers.

eglQueryDevicesEXT() does not load any drivers, it lists drm (and software) devices. As per the EGL API, the implementation could assume, yet is not guaranteed that any device, platform, etc will work until eglInitialize() returns EGL_TRUE.

I see now that you're specifically talking about initializing something in the kernel drivers (open() some device node) being the slow part. Our implementation initializes the relevant driver state as part of eglQueryDevicesEXT() and holds that reference open for the lifetime of the app, so there's essentially zero init cost to further queries against those devices.

It sounds like the GPU will be powered on, burning power, even if nobody uses it. I can see the reason behind it, but it's not something open-source kernel drivers are keen on doing.

Even if we assume drivers can't improve their implementations, the overhead in your math above is exaggerated as it is a worst-case scenario where you only consider one device, which isn't meaningful since if there's only one device there's little point in querying compatibility.

There are plenty of cases of 2+ GPUs (from a single vendor or not), using solely Mesa. Even then, it's the developer decision whether to query the one, two or however devices they have around - little point or not.

If you look at the more meaningful case you suggested, where there are 6 different devices, you'd end up with an overall throughput of 12/14 = 85% using the measurements you described above. Worst case, 2 devices, you'd get 4/6 = ~67% with no optimization. And remember this is all optional. If the app needs max performance, it can rely on error codes instead. That's a design choice applications should make, not driver APIs.

In light of the eglQueryDevicesEXT() clarification, I believe these numbers are a bit off.

Let me put this question: What would your reaction would be, if we propose an API that makes the Nvidia implementation X times worse, regardless which numbers we consider?

Now let's remind ourselves this change has an impact on all of Mesa, which covers hardware from 6+ vendors. Obviously the severity of impact depends on the vendor HW, kernel driver, etc.

@kbrenneman
Copy link
Contributor

If we assume the worst-case scenario:

  • 2 devices. One device is uninteresting, and more than two gives a smaller performance ratio)
  • Assume that the query takes the same amount of time as calling the full (eglGetPlatformDisplay, eglInitialize, eglTerminate) sequence.

Given that, if an app queries each device and then picks and initializes one, then it will take time for three (eglGetPlatformDisplay, eglInitialize, eglTerminate) sequences.

If the app doesn't use the query, and instead calls eglInitialize directly on each device, then it goes through two (eglGetPlatformDisplay, eglInitialize, eglTerminate) sequences.

Based on that, the absolute worst-case performance of using the query is 2/3 speed. And remember, an application doesn't have to use the query function -- it can use either of those methods, at the app developer's discretion.

Now conversely, if you have a query function, then an implementation might not need to take as long as the full (eglGetPlatformDisplay, eglInitialize, eglTerminate) sequence. For instance, a driver might do offscreen rendering (like it would with EGL_EXT_platform_device) and use some common presentation path to copy the image to the screen, in which case it wouldn't need any device-specific checking at all. In that case, the query function could be as fast as calling XQueryExtension, but an eglInitialize call could still be the slow operation you've described.

Not having a query function means that you are asserting, at the API level, that a driver cannot do anything more efficient to check for device or platform support.

@stonesthrow
Copy link
Contributor

Resolve conflicts and address requested changes from Cubanismo and we can proceed.

@cubanismo
Copy link
Contributor

It sounds like the GPU will be powered on, burning power, even if nobody uses it. I can see the reason behind it, but it's not something open-source kernel drivers are keen on doing.

No, initializing a device enough to query its capabilities in our internal driver API doesn't necessarily mean increasing the power draw of our GPUs.

Let me put this question: What would your reaction would be, if we propose an API that makes the Nvidia implementation X times worse, regardless which numbers we consider?

I've been pretty consistent about pushing for designs that are right, rather than designs that are right for NVIDIA's current SW stack. Regardless, I'm not trying to make this an NV Vs. not-NV drivers debate. When I share observations about how our driver behaves, it's just to note internal driver interfaces are not immutable, and there exist other implementation options than what NVIDIA, DRM, DRI, or Mesa do right now. GLX has seen 3 major iterations of the DRI interfaces, probably as many or more significant changes to the DRM interfaces, and I can't even speculate how many major changes to Mesa internals without any major changes to its application-facing API. Point being, application-facing interfaces tend to outlive internal driver interfaces, and hence their design should be driven by feasibility, maximal use of HW across the relevant vendors, and meeting the needs of application developers, not current limitations of driver internals.

@oddhack
Copy link
Contributor

oddhack commented Aug 7, 2019

@nwnk checking in on this extension - did it ever ship? Do you still intend to resolve remaining issues and get it registered?

@kbrenneman
Copy link
Contributor

The question of how an application figures out which devices can support a display is still open.

Now that I think about it, returning an error code for an unsupported (EGLDeviceEXT, native_display) pair isn't far from what can happen even now. An driver can support a platform extension, but still not support a specific native display. For instance, a driver that supports EGL_KHR_platform_x11 might still fail if you handed it a remote X server.

If we go with an error code, then there are a couple details we'd need to nail down:

First, should eglGetPlatformDisplay always check if the (device, display) pair is supported, or can an implementation defer that check to eglInitialize?

Allowing a check an eglGetPlatformDisplay is probably unavoidable. Even if a driver would defer the check to eglInitialize, with libglvnd, eglGetPlatformDisplay could still fail if it gets an EGLDevice from a vendor doesn't support EXT_EXT_explicit_device.

But, if checking for support requires expensive initialization, then we'd probably want to wait until eglInitialize, since that's when the driver would need to do that initialization anyway.

Second, what error code should it hand back if the device and display are otherwise valid, but the driver doesn't support the combination? Would there be any use for an application to distinguish this case from other failures?

@cubanismo
Copy link
Contributor

Talked offline with @evelikov about the practicalities of this. Aside from the ambiguities @kbrenneman is attempting to resolve in his last comment, I'm fine with a guess-and-check approach at this point.

To the last comment, it is kind of unfortunate to have two different failure paths for this scenario. I be naively apps would want to fail GetDisplay if the device + native display were incompatible, but I agree that might run into the same issues @evelikov is trying to avoid in the first place. It probably just depends on what drivers actually do as far as initialization in GetDisplay(). I haven't yet looked into how this is currently divided up in our driver.

@kbrenneman
Copy link
Contributor

A driver might be able to figure out in advance whether a device can work, depending on how it actually gets the image data to a drawable. If you know that you'll always use DRI3+Present to display an image, for example, then it might be enough to know that those extensions are supported.

@kbrenneman
Copy link
Contributor

If we want to have an incompatible device/native_display only fail in one place, and eglGetPlatformDisplay isn't an option, then I think it might actually be possible for libglvnd to provide a "successful" call to eglGetPlatformDisplay and a failure in eglInitialize, even for a vendor that doesn't support EGL_EXT_explicit_device.

If we added an internal dummy vendor that just fails eglInitialize (and similar stubs for everything else), then libglvnd could create an EGLDisplay handle using that.

I'm not sure if that would be cleaner than just allowing either eglGetPlatformDisplay or eglInitialize to fail, though.

@cubanismo
Copy link
Contributor

One downside I thought of for deferring failure to eglInitialize() is that at this point the app has an EGLDisplay object. Currently, there's no way to destroy an EGLDisplay object, so any resources associated with it will essentially be leaked for the life of the application. If we could fail upfront in eglGetPlatformDisplay(), we could avoid that. Of course, there are extensions proposed to fully destroy display handles as well.

@kbrenneman
Copy link
Contributor

That's true. It would also be make things more complicated if we wanted to use this extension in libglvnd for setting up automatic GPU offloading.

It would be a pretty clean design to let libglvnd pick an EGLDeviceEXT handle based on some user configuration and add it to the attribute list in eglGetPlatformDisplay. If a failure due to a mismatch always happens in the vendor's eglGetPlatformDisplay call, then it's easy to fall back to the default (or a different device) if the chosen device doesn't work.

If the failure could be deferred until eglInitialize, then that's harder, because libglvnd would have to call eglInitialize/eglTerminate to check if the display actually works.

@cubanismo
Copy link
Contributor

@evelikov, any further thoughts here? It sounds like from a design perspective, GetPlatformDisplay is the preferable failure point, but since the design revolves around what's feasible in Mesa, we need to know if that's possible without too much work. I suppose this boils down to whether gallium/DRI caps can be queried at this point in Mesa's init process?

@stonesthrow
Copy link
Contributor

Friendly poke to see if you want to resolve or close this.

@kbrenneman
Copy link
Contributor

I think this extension is still useful. We just need to nail down the specific error behavior between eglGetPlatformDisplay and eglInitialize.

I think it would be cleaner to require eglGetPlatformDisplay to check for compatibility between the EGLDeviceEXT and the native display, if that's something that Mesa can reasonably support.

@fooishbar
Copy link
Contributor

@kbrenneman We can handle that, yeah. From the Mesa side we haven't really pushed it forward, because it turned out to be easier (as well as a lot better) to work on deleting the giant midlayer that made it difficult to pass through device + display explicitly, than to type up support for it.

@kbrenneman
Copy link
Contributor

I wrote up a new version of the spec based on the discussion here, and posted it in #157.

@kbrenneman
Copy link
Contributor

Now that #157 is checked in, I think we can close this one.

@stonesthrow
Copy link
Contributor

If you want to abandon, I think you can close it. If not I can.

@kbrenneman
Copy link
Contributor

I can't close it, because nwnk opened it originally.

@stonesthrow
Copy link
Contributor

Abandon. #157 merged.

@stonesthrow stonesthrow closed this Jul 8, 2022
XZiar pushed a commit to XZiar/EGL-Registry that referenced this pull request Feb 7, 2023
Make GL_EXT_direct_state_access available in core profile
XZiar pushed a commit to XZiar/EGL-Registry that referenced this pull request Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants