Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Mirroring/Magic Window #224

Closed
toji opened this issue Apr 27, 2017 · 7 comments
Closed

Proposal: Mirroring/Magic Window #224

toji opened this issue Apr 27, 2017 · 7 comments
Milestone

Comments

@toji
Copy link
Member

toji commented Apr 27, 2017

We need to have a clear idea of how mirrored and magic window content get processed and displayed on the page before we wrap up the spec, even though most of the complexity won't come into play until we get to a multi-layered world further down the road. (The terms "Mirroring" and "Magic Window" as used in this issue are defined in #222)

(This issue assumes that the managed framebuffer feature described in #218 will be part of the spec in some form)

Challenges:

While a simple, single layer magic window mode could be managed by simply allowing developers to draw to a canvas backbuffer that method falls apart when support multiple layers are introduced. At that point, if the developer wants to accurately display the same contents in magic window mode that the user would see in the headset they either have to:

  • Reproduce the rendering logic for all layer types in use and render them manually or
  • Rely on the UA to do the compositing and render the output to the screen.

From a developer's point of view the second option is obviously preferable, and this issue will make the assumption that the first option is untenable. This puts magic window rendering in the same category as mirroring in that they both now rely on the UA to composite the various inputs and provide that composited content back to the page. The key differences between the two modes are:

  • When mirroring the on-screen content does not need to be as high quality or low latency as the in-headset content. Mirrored views should be allowed to downscale content, crop output to a more natural FOV, drop frames, and display stale frames because the intent is only to provide external viewers an idea of what the user is seeing in the headset. Whether or not that quality should be controllable by the developer is an open question. Not needed at all on devices without an external monitor
  • When in Magic Window mode the content quality should be under developer control and treated more strictly, since it's now the primary method of content consumption.

Native API support is also worth considering. In many cases it would be desirable for UAs to map certain layer types to a native API compositor feature for performance or quality purposes, but in order to do that fully the native API would also have to have a way to provide it's own mirrored view of the final composited content. Of the desktop APIs that I've looked at both OpenVR and the Oculus API have a method for retrieving a mirror texture for this exact use. (It's worth noting that the OpenVR mirror texture, at least, also includes chaperone visualization and system menus, so using it directly COULD represent a security concern if not handled carefully.) OpenXR is still under development but has not committed to making a mirror texture available, and WinMR doesn't seem to have any mirroring APIs but that's likely because it also doesn't appear to have any form of native layering. Mobile APIs don't have any applicable mirroring concessions for obvious reasons.

In all of the above cases, however, the mirroring functionality that is available is only able to be used when the device is actively presenting, making them unusable for magic window output. This would imply that in a multi-layer implementation the UA will need to always support a "manual" compositing mode, where the UA composites all layer content itself into a page-accessible form, and optionally a device-targeted version of the same compositor that may make use of native API affordances when applicable.

(A second, but less appealing option is that UAs could advertise non-exclusive sessions that only support a single WebGL layer, or even simply not advertise support for non-exclusive sessions at all if they can't handle the necessary compositing.)

Overview:

In both scenarios the UA needs to know where the content is to be output. It seems as though the most natural way to do so it to provide a canvas element as an output surface. I would suggest that a canvas with a ImageBitmapRenderingContext context is the appropriate output target. The output canvas should be specified at session creation time.

(Side note: We may want to change VRWebGLLayers to use a WebGLRenderingContextBase as the source directly, rather than the host Canvas.)

Non-exclusive sessions should still be allowed to be created without any output canvas, which would make them "tracking only" sessions. These sessions wouldn't allow any layers to be created for them, and (assuming the latest conversations from #218 pan out) would have 0 views. (If we did have to provide projection matrices they should assume a square aspect ratio). All they'd really be good for is tracking the device pose, but that would still be useful in some cases.

Non-exclusive sessions with an output canvas are "magic window" sessions. Projection matrices should use the canvas dimensions when computing the aspect ratio, and it should have 1 view. (Punchthrough magic window mode, as described in #222 is really interesting, but I feel like it should be opt-in and tackled in a later spec version, so I'm ignoring it for the moment.) [UPDATE: I've reconsidered this position a bit. See my followup comment]

If an output canvas is given with an exclusive session then the headset output is mirrored to it with whatever transforms and at whatever rate the UA feels is appropriate. We should intentionally NOT be too specific about how this mode functions to allow UAs to handle it in whatever manner is most efficient for the hardware and rendering pipeline. Key messaging for developers would be that this mode is for simple observation and debugging only and should not be relied on for quality or timing critical uses. We should also make sure that the canvas context can't read back the mirrored output. The output canvas can be completely ignored on mobile devices or device without an external monitor.

Exclusive sessions can still operate normally if no output canvas is given, which in most cases is effectively an optimization since no time needs to be spent mirroring.

So that's all reasonably complicated and will be hell to write spec text for, but for developers hopefully it'll all be pretty intuitive when you're actually using it. 🤞

Rough IDL:

dictionary VRSessionCreateParametersInit {
  HTMLCanvasElement outputCanvas = null;
  boolean exclusive = true;
};

interface VRSessionCreateParameters {
  readonly HTMLCanvasElement outputCanvas;
  readonly boolean exclusive;
};

[Constructor(VRSession session, WebGLRenderingContextBase context, VRWebGLLayerInit layerInitDict)]
partial interface VRWebGLLayer : VRLayer {
  WebGLRenderingContextBase context;
};

Example Usage:

// No mirroring
vrDevice.requestSession().then(session => {
  session.baseLayer = new VRWebGLLayer(session, gl);
});

// Mirroring
vrDevice.requestSession({ outputCanvas: imgCanvas }).then(session => {
  session.baseLayer = new VRWebGLLayer(session, gl);
});

// Tracking only
vrDevice.requestSession({ exclusive: false }).then(session => {
  session.baseLayer = new VRWebGLLayer(session, gl); // ERROR! No layers allowed!
});

// Magic Window
vrDevice.requestSession({ exclusive: false, outputCanvas: imgCanvas }).then(session => {
  session.baseLayer = new VRWebGLLayer(session, gl);
});

Outstanding Questions:

  • What rate are new poses produced at for a tracking only session?
  • Do we possibly want to limit non-exclusive sessions to a single layer for ease of implementation? 😒
@toji toji added this to the 2.0 milestone Apr 27, 2017
@toji
Copy link
Member Author

toji commented Apr 27, 2017

Actually, after a lunch conversation with @kenrussell I'm gonna walk back part of this: For the sake of simplicity and inventing as little new spec language as possible I think we should start out limiting WebVR magic-window or mirror output to only canvases with an ImageBitmapRenderingContext. I'll edit the above to indicate that as well.

@toji
Copy link
Member Author

toji commented Apr 27, 2017

Ooh, and one thing I forgot to mention earlier, upon re-reading this: While it would make me sad I feel like we could conceivably leave outputCanvas out of the spec for the initial release, since it's most important in a multi-layer world. This means that desktop mirroring and a unified magic window mode would be missing from the first version, which would suck. But, it's a viable feature to push back to reduce complexity/release timelines if needed. (Magic window is still viable with a "tracking only" session, the developer just has to put more work into it and write a separate rendering path for it.)

The important bit is that we have a relatively clear idea of how it can be layered on so that we don't do anything to prevent it from working early on.

@brianchirls
Copy link

@toji Always better to write complicated specs/code on a full stomach. Smart move. I do appreciate keeping things as simple as possible. It's been a fair bit of work to keep up with the evolving spec.

In addition to the presentation modes listed in #222, please consider the case where the screen would display both a third-person view and a mirrored view, perhaps in a "picture-in-picture" arrangement. This is something I do often and have found it useful. I haven't seen anything in here that I think would preclude that option, but I'd like to document it here just in case. Thanks.

@toji
Copy link
Member Author

toji commented Apr 28, 2017

@brianchirls: I didn't cover third-person view here because it's not really a part of the WebVR API, just a happy side effect, but I realized it's worth covering here because of the ImageBitmapRenderingContext context restriction I mentioned above actually makes your precise use case work better.

In general any third person views of the scene would simply be rendered to a canvas normally without paying attention to the prescribed views the WebVR API gives you. Given that the latest API directions have you rendering VR content into a specifically allocated framebuffer, this means that anything rendered to the canvas backbuffer just gets composited to the page like normal WebGL content. So in the most common third person case, where you want to render first person to the headset and third person to the page, it looks something like this:

function OnDrawVRFrame(vrFrame) {
  let pose = vrFrame.getDevicePose(vrFrameOfRef);
  
  // Draw first person views to the headset
  gl.bindFramebuffer(vrLayer.framebuffer);
  for (let view of vrFrame.views) {
    let viewport = vrLayer.getViewport(view);
    gl.viewport(viewport.x, viewport.y, viewport.width, viewport.height);
    drawScene(view.projectionMatrix, pose.getViewMatrix(view), view.eye);
  }

  // Draw third person view to the page
  gl.bindFramebuffer(null); // Default framebuffer
  gl.viewport(0, 0, gl.drawingBufferWidth, gl.drawingBufferHeight);
  // Make sure you've got some visualization of where the headset user is
  headsetMesh.setTransform(pose.poseModelMatrix);
  headsetMesh.visible = true;
  drawScene(thirdPersonProjectionMatrix, thirdPersonViewMatrix, "left");
  headsetMesh.visible = false;

  vrSession.requestVRFrame().then(OnDrawVRFrame);
}

Now if your case, you also wanted a mirror view to show up on screen in addition to the third person view. To be honest I'm not totally sure how you were doing that in the past, but now you would have a secondary canvas with a ImageBitmapRenderingContext that is set as the session outputCanvas, which would automatically update it with a mirrored view. To get the picture-in-picture style you want you'd use normal CSS to compose the mirror canvas and the third person canvas. Should allow for a fair amount of flexibility!

FYI: Got some more comments incoming re: limiting magic window to a mono view by default, but I need some more time to type that up.

@toji
Copy link
Member Author

toji commented Apr 28, 2017

Okay, back for part 2:

In responding to some email about this particular proposal, I realized that there's a pretty big decision that needs to be made here which I glossed over in my proposal above and I'm now leaning a different direction than my first post implied.

The TL:DR version is: Should we allow UAs to upgrade magic window content to match their full capabilities by default (probably with an opt out), or be conservative and require content to opt in?

The longer version is this: With the n-view, prescriptive approach that the API is trending towards we have an interesting opportunity to give the UA a lot of leeway in how it requests that the content be drawn. With the same basic app render loop the page could request a 3DoF Mono magic window on a mobile device, a 6DoF Mono magic window on Tango, a 6DoF Stereo magic window on a zSpace or VR browser, or full immersive VR. Or if the UA decided it didn't want to support a 6DoF stereo view even if they technically had all the components to support it they could request just an untracked mono view. UAs could even allow users to opt in and out of stereo rendering on a per-element basis. And all of it is just a matter of what views and matrices the API exposes to a VRFrame.

(Quick aside: If you want a sense of what a stereo magic window could look like, you can picture canvas elements that feel like a hole in the page with 3D content behind it who's perspective changes as you move your head. Sorta like this awesome little sketchfab scene. Cool, right?)

To enable that, all we need to do at the spec level is say that in magic window mode, just like in exclusive mode, the developer may receive any number of views to draw, and the matrices and viewports given may be skewed/squished/etc so the app developers shouldn't make too many assumptions about the inputs. Then it's up to the UA to determine how much it can/wants to support. Sounds cool, right?

The potential downsides are: Stereo rendering DOES cost more than mono rendering, and if a page isn't expecting that it could kick them into a bad place performance wise. Also, there's a chance that the developer does some UI rendering or something similar that makes assumptions about what the viewport will be that holds true when they test on the UAs and hardware they have available (which is all mono) but then falls apart when one of their users tries it on a device which presents it as 6DoF stereo. Also, @DigiTec pointed out on a previous call that if we try to over-generalize the rendering pipeline we may make it very difficult to do interesting rendering optimizations (OVR_multiview, etc) correctly.

The conservative approach to those problems is to only allow magic window content that has explicitly opted in to headtracked, stereo rendering to take advantage of those capabilities. That's what I implicitly advocated for in my first post on this issue (1 view, punch-through mode ignored).

If, on the other hand, we make it an opt-out, we have the opportunity to bootstrap the VR web with a lot of cool volumetric content that grows naturally and organically, embracing the concepts of progressive enhancement. And in the cases where developers do find their content didn't work as intended they can turn off that mode pretty easily.

I feel like, though it won't be without it's challenges, we should be striving to allow pages to take the fullest advantage of the hardware available by default, so I propose that we adopt an opt-out pattern, like so:

// Explicitly mono Magic Window
vrDevice.requestSession({ exclusive: false, outputCanvas: imgCanvas, mono: true }).then(session => {
  session.baseLayer = new VRWebGLLayer(session, gl);
});

This is obviously a huge topic, so I'm happy to hear comments!

@toji
Copy link
Member Author

toji commented Apr 28, 2017

Ooh, and another tangential side effect for this proposal: If there's a 1:1 relationship between a session and an output canvas, we should probably allow for more than one non-exclusive session per page. (Thankfully non-exclusive already seems to imply that.) That way you could have galleries with multiple magic windows that can all have their own unique projections. Exclusive sessions get exclusive access to the device, obviously, and so all non-exclusive sessions would have to be suspended.

toji added a commit that referenced this issue Jun 1, 2017
Described how magic window mode works with the `outputCanvas` proposal.
Also, because they’re natural variants of the same mechanism, described
mirroring and tracking only behavior. Follows the proposal in #224.

Also changed VRPresentationFrame -> VRFrame because if we moved forward
with the tracking only portion of this there would be scenarios in
which the VRPresentationFrames had no Presentation component.
toji added a commit that referenced this issue Jun 15, 2017
Described how magic window mode works with the `outputCanvas` proposal.
Also, because they’re natural variants of the same mechanism, described
mirroring and tracking only behavior. Follows the proposal in #224.

Also changed VRPresentationFrame -> VRFrame because if we moved forward
with the tracking only portion of this there would be scenarios in
which the VRPresentationFrames had no Presentation component.
toji added a commit that referenced this issue Jun 29, 2017
Described how magic window mode works with the `outputCanvas` proposal.
Also, because they’re natural variants of the same mechanism, described
mirroring and tracking only behavior. Follows the proposal in #224.

Also changed VRPresentationFrame -> VRFrame because if we moved forward
with the tracking only portion of this there would be scenarios in
which the VRPresentationFrames had no Presentation component.
toji added a commit that referenced this issue Jul 7, 2017
Described how magic window mode works with the `outputCanvas` proposal.
Also, because they’re natural variants of the same mechanism, described
mirroring and tracking only behavior. Follows the proposal in #224.

Also changed VRPresentationFrame -> VRFrame because if we moved forward
with the tracking only portion of this there would be scenarios in
which the VRPresentationFrames had no Presentation component.
toji added a commit that referenced this issue Jul 22, 2017
Described how magic window mode works with the `outputCanvas` proposal.
Also, because they’re natural variants of the same mechanism, described
mirroring and tracking only behavior. Follows the proposal in #224.

Also changed VRPresentationFrame -> VRFrame because if we moved forward
with the tracking only portion of this there would be scenarios in
which the VRPresentationFrames had no Presentation component.
toji added a commit that referenced this issue Jul 25, 2017
Described how magic window and mirroring work with a canvas `outputContext`. Follows the proposal in #224.
@toji
Copy link
Member Author

toji commented Jul 25, 2017

Since #255 is merged now I'm going to close this in favor of opening individual issues against points of concern in the API if there are any.

@toji toji closed this as completed Jul 25, 2017
toji added a commit that referenced this issue Aug 14, 2017
Described how magic window and mirroring work with a canvas `outputContext`. Follows the proposal in #224.
@cwilso cwilso modified the milestones: Spec-Complete for 1.0, 1.0 Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants