Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebXR Device API #403

Closed
3 of 5 tasks
AdaRoseCannon opened this issue Aug 8, 2019 · 8 comments
Closed
3 of 5 tasks

WebXR Device API #403

AdaRoseCannon opened this issue Aug 8, 2019 · 8 comments
Assignees
Labels
Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Review type: later review Topic: graphics Topic: powerful APIs APIs that reach into your life. Venue: Immersive Web WG

Comments

@AdaRoseCannon
Copy link

こんにちはTAG!

I'm requesting a TAG review of:

@NellWaliczek, editor
@toji, editor
@cwilso, WG co-chair
@AdaRoseCannon, WG co-chair
@TrevorFSmith, CG chair

Further details:

The WebXR device API has recently reached the point where it is considered a feature complete replacement for the deprecated WebVR API. We have also switched the work mode to be based around modules where the current "vr complete" WebXR device API acts as a core with other modules such as 'webxr-ar-module' and 'webxr-gamepad-module' building on that, we are not requesting a review for these modules yet.

We are also working on a polyfill for the WebXR device API, https://github.com/immersive-web/webxr-polyfill/

In addition there are multiple browsers vendors working on implementations in their browsers.

We recommend the explainer to be in Markdown. On top of the usual information expected in the explainer, it is strongly recommended to add:

  • Links to major pieces of multi-stakeholder review or discussion of this specification:
  • Links to major unresolved issues or opposition with this specification:

You should also know that...

[please tell us anything you think is relevant to this review]

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our GitHub repo for each point of feedback
  • open a single issue in our GitHub repo for the entire review
  • leave review feedback as a comment in this issue and @-notify [github usernames]

Please preview the issue and check that the links work before submitting. In particular, if anything links to a URL which requires authentication (e.g. Google document), please make sure anyone with the link can access the document.

¹ For background, see our explanation of how to write a good explainer.

@toji
Copy link

toji commented Aug 22, 2019

Hi TAG members! We saw that one of the questions that came up while reviewing this API is what relationship it has with WebVR. That's an excellent question, and one that we felt justified answering in our explainer. We just added a new section towards the end to cover the topic, copied here for convenience. (The short version is "WebXR is a replacement for WebVR, developed by the same group.")

What's the deal with WebVR?

There's understandably some confusion between the WebXR and an API that some browsers have implemented at various points in the past called WebVR. Both handle communication with Virtual Reality hardware, and both have very similar names. So what's the difference between these two APIs?

WebVR was an API developed in the earliest days of the current generation of Virtual Reality hardware/software, starting around the time that the Oculus DK2 was announced. Native VR APIs were still in their formative stages, and the capabilities of commercial devices were still being determined. As such the WebVR API developed around some assumptions that would not hold true long term. For example, the API assumed that applications would always need to render a single left and right eye view of the scene, that the separation between eyes would only ever involve translation and not rotation, and that only one cannonical tracking space was necessary to support. In addition, the API design made forward compatibility with newer device types, like mobile AR, difficult, to the point that it may have necessitated a separate API. WebVR also made some questionable descisions regarding integration with the rest of the web platform, specifically in terms of how it interacted with WebGL and the Gamepad API. Despite this, it worked well enough in the short term that some UAs, especially those shipped specifically for VR devices, decided to ship the API to their users.

In the meantime the group that developed WebVR recognized the issues with the initial API, in part through feedback from developers and standards bodies, and worked towards resolving them. Eventually they recognized that in order to create a more scalable and more ergonomic API they would have to break backwards compatibility with WebVR. This new revision of the API was referred to as WebVR 2.0 for a while, but eventually was officially renamed WebXR in recognition of the fact that the new API would support both VR and AR content. Developement of WebXR has been able to benefit not only from the group's experience with WebVR but also from a more mature landscape of immersive computing devices that now includes multiple commercial headsets, the emergence of both mobile and headset AR, and multiple mature native APIs.

WebXR is intended to completely replace WebVR in the coming years. All browsers that initially shipped WebVR have committed to shipping WebXR in it's place once the API design is finished. In the meanwhile, developers can code against WebXR, relying on the WebXR Polyfill to ensure their code runs in browsers with only WebVR implementations.

We also wanted to ask for the TAG to weigh in on a technical issue we've encountered with how WebXR interacts with Feature Policy. The full issue is detailed in WebXR Issue 768, but that's a long read and assumes some prior contextual knowledge, so I'll simplify it here:

On some devices (such as phones) WebXR surfaces motion data that is effectively a re-packaging of the data exposed by deviceorientation events or the generic sensors APIs. (In fact, the polyfill relies on deviceorientation to function on mobile devices.) It's not exactly the same, as WebXR applies various motion prediction and skeletal modeling algorithms to the data to better serve the API's purpose, but they're close enough that a motivated developer could use WebXR as a deviceorientation alternative if needed.

(Please note that this does not apply to devices such as tethered headsets connected to a PC, as they would not have their motion data exposed through deviceorientation/generic sensors.)

The question then is: If a developer has specified thought Feature Policy that WebXR is allowed but one of the sensor APIs which surface related data is blocked, should WebXR also avoid surfacing that data? This would result in the WebXR reporting that it is unable to support VR content on mobile devices, while allowing desktop devices in the same circumstances, which seems difficult for developers to predict and test. On the other hand, if we allow WebXR to surface similar data to blocked APIs, it may be possible for developers to use WebXR to polyfill the other sensor APIs, subverting the presumably intentional blocking of those features via Feature Policy.

Given that this seems to be a novel situation for the web platform, with the potential of setting precedent for how other APIs interact with Feature Policy in the future, we wanted to get the TAG's opinion before finalizing how WebXR will handle this situation. Any insight you may have is appreciated!

@alice
Copy link

alice commented Sep 3, 2019

Thanks for raising this review! I had a read through the spec (and, as discussed offline, sent a PR attempting to make some aspects of the explainer more concise and readable) and I came up with some questions/thoughts:

Accessibility

Obviously I'd like to see the question of accessibility addressed sooner rather than later.

I am looking forward to the session at TPAC dedicated to this question, but I noted that the Goals section noted only "Display imagery on the XR device at the appropriate frame rate" alongside "Poll the XR device and associated input device state".

That seems overly narrow even leaving the question of accessibility aside, given that many existing immersive experiences include sound and haptic feedback. In particular, though, for innovation in XR accessibility to be possible, authors will need the ability to control different modalities for conveying information about the virtual space which is being presented.

Could the Web Audio, Vibration and Gamepad APIs make use of XRViewerPose to provide this immersive experience? How does that work with the frame-based mechanism for updating the XRViewerPose? Could the explainer (or another document) provide some example code for linking these things together?

For users who require magnification, might it make sense to have an option on the viewport to perform appropriate scaling automatically?

There are also some interesting use cases around accessibility mentioned in the research document linked above, which might make good motivating examples:

  • virtual assistive devices (e.g. a "virtual cane") for navigating virtual environments
    • I could also imagine an assistive technology which used augmented reality techniques to provide assistive feedback for the real world - such as using spatial audio to warn people with visual impairments about hazards at head height, or providing subtitles to real life for Deaf/hard of hearing individuals
  • simulation of disability in virtual reality, both as a rehabilitation aid and a training aid
  • virtual environments designed to be therapeutic to individuals with cognitive differences
  • virtual exploration of architectural designs developed according to Universal Design principles

Explainer/API questions

  • Please add a table of contents!
  • The explainer lists immersive video as a use case. Why would we not design an extension to <video> for this case?
  • Why does navigator.xr.supportsSession() not return a Promise<boolean> rather than rejecting in the case that the session type is not supported? That would seem like a better semantic match to the wording of the method, as well as not requiring the author to program a simple feature detection step in a try/catch style.
    • Could you elaborate on why inline is the default?
  • Naming: it seems like the vr in immersive-vr is both redundant and inaccurate (since it doesn't encompass AR). Could it just be immersive?
  • The explainer doesn't provide example code for avoiding or handling WebGL context loss. Is the author supposed to avoid it altogether by using makeXRCompatible(), or are there other factors to consider?
  • Similarly, a code example for handling XR device changes would be useful.
  • Could you deep-link to the section in the Spatial Tracking Explainer which explains how to handle getViewerPose() failures?
  • Might it be helpful to provide a code example showing how to use the transform property of the XRViewerPose value?
  • Could you expand on the key concept of a Layer?
  • What are the key differences between an XRWebGLLayer and a <canvas>?
  • When might a session be blurred?

@toji
Copy link

toji commented Sep 5, 2019

Thank you for your feedback! I'll answer what I can below, with some tasks broken out into separate issues/PRs as indicated.

Focusing on the Explainer/API questions first, since those can generally be answered more concisely:

  • Please add a table of contents!

Thank you for demonstrating an effective way to do this in your explainer PR. If we don't merge that PR directly we'll be sure to add a TOC ourselves soon.

  • The explainer lists immersive video as a use case. Why would we not design an extension to <video> for this case?

We would very much like to see immersive playback in the <video> tag in the near future, but feel that implementing WebXR is an appropriate first step to getting there, in the spirit of the extensible web manifesto. Specifically, immersive <video> support can effectively be polyfilled with WebXR, while the reverse is not true. And, of course, a more general API like WebXR can also support many other non-video use cases, which has already proven to be valuable.

Additionally, there is not yet consensus on the video/audio formats and projection techniques that are optimal for these use cases. (This is a similar problem to map projection, in that there's no "perfect" way to lay out the surface of a sphere on a flat plane.) Similarly, we've seen on the 2D web that various video players are not satisfied with the default video controls and will frequently provide their own. It's reasonable to expect that trend to continue with immersive video and it is not yet clear what the appropriate mechanism is for providing custom controls in that environment, whereas in WebXR it's implicitly the application's responsibility to render them.

By starting with an imperative API we give developers a lot more flexibility in how they store, transmit, display, and control their content which ideally will help inform future discussions around what knobs and levers are necessary to add to the <video> tag. (And even then WebXR will serve as a fallback if your content doesn't fit into one of the canonical formats.) We do expect, and already see, libraries built around the API to simplify video playback, and would anticipate that those libraries could trivially redirect their functionality to a video tag should support be added in the future.

  • Why does navigator.xr.supportsSession() not return a Promise<boolean> rather than rejecting in the case that the session type is not supported? That would seem like a better semantic match to the wording of the method, as well as not requiring the author to program a simple feature detection step in a try/catch style.

I've opened an issue for further discussion on this topic, since it's one of the few potentially breaking changes you've brought up. It seems to me, though, like our usage here is in line with other similar methods that return a Promise<void> in APIs such as WebUSB and WebAudio. Are there guidelines regarding this type of use that we could refer to?

  • Could you elaborate on why inline is the default?

This was actually left in the explainer erroneously. There is no default mode, which is reflected in the rest of the explainer and spec IDL. (PR to fix) Historically it was default because it was the mode which requires the least user consent.

  • Naming: it seems like the vr in immersive-vr is both redundant and inaccurate (since it doesn't encompass AR). Could it just be immersive?

We intend to introduce an immersive-ar mode in a spec module soon after WebXR ships. In a previous iteration of the spec we specified the session mode as a dictionary, which treated "immersive" as a separate boolean and had a separate field for specifying that AR capabilities were desired like so:

// Not compatible with the current spec!
navigator.xr.requestSession({
  immersive: true,
  ar: true
}).then(/*...*/);

The primary issue this introduced was that it implied that a non-immersive AR mode was a possibility, when we had no intent of ever supporting it. Plus every new mode that is added would then have to reason about how it interacted with each of those booleans even if they weren't necessarily applicable. The use of enums was eventually deemed to be a cleaner approach.

  • The explainer doesn't provide example code for avoiding or handling WebGL context loss. Is the author supposed to avoid it altogether by using makeXRCompatible(), or are there other factors to consider?

Issue filed to ensure we demonstrate handling context loss.

More generally, there are two routes to ensuring context compatibility. If the context is created with the xrCompatible: true context creation argument, then the returned context will be compatible with WebXR uses and no context loss will be incurred for that reason. (The system may still lose the context for other reasons, such as reinstalling the graphics driver.) This is appropriate for pages who's primary purpose is to display WebXR content. For pages where immersive context is a secondary feature making the context compatible from the start may introduce undesired side effects (such as causing the context to run on a discreet GPU instead of a more battery-friendly integrated GPU), and so the compatibility bit can be set late using the makeXRCompatible() method. This may force a context loss on some devices if the context needs to be moved to a new adapter (while on others, such as those with only a single GPU, it can be a no-op).

  • Similarly, a code example for handling XR device changes would be useful.

Issue filed to add a devicechange event code sample

  • Could you deep-link to the section in the Spatial Tracking Explainer which explains how to handle getViewerPose() failures?

I'm not sure exactly what this is asking for? Deep link from where?

  • Might it be helpful to provide a code example showing how to use the transform property of the XRViewerPose value?

Issue files to add more code samples for XRRigidTransform use

  • Could you expand on the key concept of a Layer?

A layer is simply an image that will be displayed on the XR hardware somehow. Right now it's pretty minimal, with only a WebGL layer being exposed initially and only one active layer being allowed at a time. But we have known features that we'd like to implement in the future that would expand the types of layers that could be used and give more flexibility to how they're presented. For example, when WebGPU ships we would introduce a new layer type that allows a WebGPU context to render to the headset, and shorter term we'd like to add a layer type that takes better advantage of WebGL 2 features.

Other examples of how we may use layers in the future:

  • Displaying encrypted video streams
  • Displaying DOM content
  • Higher quality 2D surfaces
  • What are the key differences between an XRWebGLLayer and a <canvas>?

Slightly oversimplifying here, but a <canvas> is for compositing on the page and an XRWebGLLayer is for compositing on the headset. Both may share a WebGL Context, and in the end both are simply hosts for a framebuffer that WebGL binds and renders into. By making the XRWebGLLayer a distinct concept we have greater control over the framebuffer that it exposes and create it in a way that's optimal for XR.

It's worth noting that previously in WebVR we effectively used the <canvas> as the layer, but this caused several problems that all had their root in the fact that a web page and a headset are very different mediums and benefit from tailor-fit approaches. A couple of simple examples:

  • We were requiring developers to resize the canvas element to a resolution that was appropriate for the headset, which is typically quite large. This was easy to get wrong and frequently resulted in either grainy imagery in the headset or significantly oversized canvases on the page.
  • Presenting to a headset typically required taking ownership of the framebuffer that was going to be displayed, which often required an expensive copy because we didn't know if the same buffer would be shown on the page as well.
  • The canvas may be constructed with options (such as preserveDrawingBuffer: true) that weren't appropriate for use with XR hardware and introduced even more unnecessary overhead.
  • When might a session be blurred?

Having a concrete example in the explainer of when this state might apply would be a good idea. visible-blurred indicates that the user can see the application and it should respond appropriately to head movement to avoid user discomfort, but the user cannot interact with the app because input is captured by the system/UA. The most common scenario for this mode we see today is that many immersive computing devices have a "dasboard" can be pulled up without quitting the immersive application by pressing a dedicated button on the controller. Similarly, if it doesn't pose a privacy/security risk, the UA may choose to display some dialogs to the user without exiting the immersive app.

A quick visual aid, showing Oculus' dashboard system:
oculus-dash-gif-small

Not all platforms support this type of interaction, especially if power is limited, and in those cases we would expect the session to only toggle between visible to hidden directly. Alternatively, the UA may take steps to reduce the app quality (such as lowering it's resolution) to improve performance while a dialog is up, which is allowed by the visible-blurred state.

Obviously I'd like to see the question of accessibility addressed sooner rather than later.

We definitely understand the importance of accessibility, and also want to ensure that immersive web content does not unnecessarily exclude users due to faulty assumptions on the part of developers about the user's abilities. This is a large topic, however, and one that we've been seeing more discussion on recently, and so I think it would be more productive for us to outline our current thinking about accessibility in a separate doc which we'll link here. Needless to say, it's a complicated problem made more difficult by the imperative nature of the rendering APIs we rely on, the relative newness of the VR ecosystem, and the type of content the device capabilities encourage. It seems likely that our accessibility story will span API enhancements, working with tool and content developers to take advantage of existing accessibility features when appropriate, encouraging best practices around use of audio and haptics, and detailing UA-level accessibility features that can apply to all content.

toji added a commit to immersive-web/webxr that referenced this issue Sep 5, 2019
@hober
Copy link
Contributor

hober commented Dec 4, 2019

Hi, @alice, @dbaron, @plinss, and I talked about this a bit today at our Cupertino F2F.

@NellWaliczek wrote, in a comment on immersive-web/webxr#818:

  • What is the appropriate way to handle enum values that are required by a specification other than the one that originally defined the eum? We talked about a few different options but didn't come to a concrete conclusion. After further discussion with the Gamepad API folks, I'm still not entirely sure what the right approach should be. We've gotten several suggestions on how to go about this, but they all have different drawbacks and there doesn't appear to be consensus on the approach. Given that this isn't a problem unique to WebXR, we'd really love to get a more definitive answer from the TAG about which approach is best for web platform consistency.

    1. When the secondary spec nears CR, move nearly all references to the enum and its purpose into the original spec
    2. When the secondary spec nears CR, move the value to the original spec and point to the secondary spec for the explanation of it's purpose and use.
    3. Investigate adding partial enums to webidl
    4. Change the enum to be a DOM string

I think a variant of (ii) is best. The variation being that I don’t think “nearing CR” is the trigger, it’s “this is being implemented in a browser engine.” (This is essentially what @dbaron said in two comments on w3ctag/design-principles#99: (1, 2))

@alice
Copy link

alice commented Dec 4, 2019

@toji

[Accessibility] is a large topic, however, and one that we've been seeing more discussion on recently, and so I think it would be more productive for us to outline our current thinking about accessibility in a separate doc which we'll link here.

I believe there was a workshop on XR accessibility recently. Were there any documents produced in that context which might be relevant here?

Could you deep-link to the section in the Spatial Tracking Explainer which explains how to handle getViewerPose() failures?

I'm not sure exactly what this is asking for? Deep link from where?

Apologies: from the first paragraph of the Viewer tracking section which links to the Spatial Tracking Explainer - could this instead be a deep link to the relevant section?

[...] Other examples of how we may use layers in the future: [...]

My proposed edits included an "Important concepts" section encompassing the concepts I had to draw out as I was reading the explainer and my best guesses as to how to explain them. It would be helpful to have an explanation about layers in your explainer, as well as in this issue thread.

When might a session be blurred?

Having a concrete example in the explainer of when this state might apply would be a good idea [...]

The example you gave here would work well! It doesn't seem to have been worked back in to the explainer yet.


One other thing:

Re-reading the explainer, I was confused by this sentence (emphasis added):

Once drawn to, the XR device will continue displaying the contents of the XRWebGLLayer framebuffer, potentially reprojected to match head motion, regardless of whether or not the page continues processing new frames.

What does that last clause mean? i.e. what does it mean for the page to continue processing new frames, if it's not writing to the framebuffer?

@hober hober added Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review and removed Progress: in progress labels Dec 4, 2019
@alice alice removed this from the 2019-09-18-TPAC-Fukuoka milestone Jan 27, 2020
@dbaron dbaron added this to the 2020-03-16-week milestone Mar 12, 2020
@hober
Copy link
Contributor

hober commented Mar 16, 2020

It looks like we marked this as pending external feedback back in December; are you still pursuing this review, @AdaRoseCannon @toji?

@alice
Copy link

alice commented May 27, 2020

It seems like this has largely settled, so I'm going to propose closing. We're generally happy with this direction, particularly since the supportsSession() issue was resolved.

Please comment here in the next week or so if you don't want us to close it; otherwise, you can comment or file a new issue after it's closed if you want more feedback.

Thanks!

@alice alice added Progress: propose closing we think it should be closed but are waiting on some feedback or consensus and removed Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review labels May 27, 2020
@toji
Copy link

toji commented May 28, 2020

I agree, and am fine with seeing this closed. As always, we're happy to reach out to the TAG if we have additional questions in the future or for reviews of additional modules that we develop.

Thank you!

@plinss plinss closed this as completed Jun 24, 2020
@Manishearth Manishearth mentioned this issue Aug 7, 2020
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Review type: later review Topic: graphics Topic: powerful APIs APIs that reach into your life. Venue: Immersive Web WG
Projects
None yet
Development

No branches or pull requests

8 participants