Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit: Let getDisplayMedia() influence the default type choice in the picker #184

Closed
alvestrand opened this issue Jun 11, 2021 · 72 comments · Fixed by #186
Closed

Revisit: Let getDisplayMedia() influence the default type choice in the picker #184

alvestrand opened this issue Jun 11, 2021 · 72 comments · Fixed by #186
Assignees

Comments

@alvestrand
Copy link
Contributor

We continue to have a strong demand from Web developers for functionality that lets them influence what kind of display surface the user will capture; this is one of the core differences between the pre-standard "Chrome extension API" and the WG-defined getDisplayMedia() function.

Such a functionality is easy to add (allow a constraint on capture surface type). If it does not block the user from picking other things, but merely changes the default capture surface (currently "screen" on both Chrome and Safari), it doesn't seem to be a huge increase in user risk exposure.

Example comment: https://twitter.com/RickByers/status/1403349775387353089?s=19

@jan-ivar
Copy link
Member

Sorry this is not ready for PR.

@jan-ivar
Copy link
Member

This would revisit an existing WG decision. Has new information surfaced since #32 to consider it? cc @martinthomson

Such a functionality is easy to add

As I recall, this was not among the concerns. The concerns, outlined in the Security and Privacy Questionnaire, were the security risks of sharing a web surface under attacker control: that it allows active attacks on the same-origin policy. The only hurdle is socially engineering users to select it.

it doesn't seem to be a huge increase in user risk exposure.

This stems partly from Chrome already violating the spec's recommendations by neither implementing elevated permissions for web sources specifically nor warning users about their elevated risk. See crbug 920752 for context.

@jan-ivar
Copy link
Member

jan-ivar commented Jun 11, 2021

As I replied on twitter, we'd like to focus on w3c/mediacapture-screen-share-extensions#9: a proposal to give web-pages that meet the security-criteria [agreed on with Chrome Security], preferential placement in the getDisplayMedia() picker. This seems like the safe and responsible way to proceed for the long-term.

@eladalon1983
Copy link
Member

As I replied on twitter, we'd like to focus on w3c/mediacapture-screen-share-extensions#9: a proposal to give web-pages that meet the security-criteria [agreed on with Chrome Security], preferential placement in the getDisplayMedia() picker. This seems like the safe and responsible way to proceed for the long-term.

When/where has Chrome Security given their blessing to w3c/mediacapture-screen-share-extensions#9? To preferential placement of certain documents in the media picker? AFAIK, Chrome Security has spoken for cross-origin isolation and an opt-in header for getViewportMedia. That's a different topic.

@alvestrand
Copy link
Contributor Author

Yes, this is asking to revisit an existing WG decision.
A PR is a perfectly adequate tool for showing what the resulting change would be.

@alvestrand
Copy link
Contributor Author

The argument is that the WG decision was based on wrong information and an inadequate security evaluation, and that the WG decision has led to a lack of conformance to the WG specification in the market. We're asking to revisit it.

@jan-ivar
Copy link
Member

@eladalon1983 They haven't. I said "Their view on a similar proposal [#155] for easy self-share is that it requires not only site-isolation but opt-in from targets in order to be safe", and then "We've put forth a proposal that would give web-pages that meet these security-criteria preferential placement in the getDisplayMedia() picker."

Chrome Security has spoken for cross-origin isolation and an opt-in header for getViewportMedia.

We're interpreting the scope of their advice differently. They said "this as a larger problem of APIs that might leak data from cross-origin resources at the page-level."

That's a different topic.

Different API, same topic: how to have webpages capture webpages safely.

@jan-ivar
Copy link
Member

@alvestrand In order to not waste the WG's time, I believe it is customary to introduce new information with such requests, is it not? Simply asking for a re-vote doesn't seem productive, because what would make a different outcome likely?

The argument is that the WG decision was based on wrong information and an inadequate security evaluation, ...

A discovery that old information is wrong might qualify as new information, if it can be substantiated. Would you be able to point out prose in the spec or its security questionnaire that is wrong?

@alvestrand
Copy link
Contributor Author

Part of the discussion showing developer interest is in https://crbug.com/904831 and bugs duplicated into it.

The current API is based on the presumption that in user story flows that involve capturing something, the user story flow is neutral as to what type of surface is to be handled, and that any input is going to be considered valid.

This presumption is clearly absurd; in nearly every user story that involves capturing something, the story involves capturing exactly one type of surface, and the idea that it should be impossible for the application to incorporate this information into its user flow is just not logical.

The idea that "an application shouldn't push the user towards sharing more dangerous surfaces" is only valid if the value of sharing a more dangerous surface and sharing a less dangerous surface has equal value to the user; this is wrong. The user wants to present what the user wants to present, and that's either a more dangerous surface or a less dangerous surface; putting obstacles in the way of the user for doing what the user needs to do can never be a good UI design.

Putting up dialog boxes and confirmation buttons has some value. Forcing the user to consider options that he is not going to choose anyway, because it is not what he wants to do, has none.

@alvestrand
Copy link
Contributor Author

as to "what's in the spec is wrong":
I was trying to find justification for why the lack of a "what you want to share can probably be found in this category" constraint was a security feature. I did not find it in either document.

This section:

"Not accepting constraints for source selection means that getDisplayMedia only provides fingerprinting surface that exposes whether audio, video or audio and video display sources are present. (This is a fingerprinting vector.)"

doesn't compute for me.

And this section (from the security questionnaire):

The decision of what to share (or whether to share anything at all) rests entirely with the end-user. Websites cannot influence this choice in any way

is listed as an answer to the question "Is this specification exposing the minimum amount of information necessary to power the feature?"

It is not at all clear that it is an answer to the question, and again, it does not reflect a reasoning behind it.

@tsahilevi
Copy link

Today the main complaint I see from vendors and users is the amount of clicking that are needed to get screen sharing done.
At the moment, that bare minimum is 3 assuming you're aiming for full screen. 4 (or more) for anything else.
This isn't user friendly to say the least.

Having the ability for the application to hint on the desired screen sharing default choice would be a good start to remedy that.
I'd also feel better if that hint/selection would be "selected" in order to reduce yet another mouse click.

@arnaudbud
Copy link

We at RingCentral think this would be useful to us.

@ajf101
Copy link

ajf101 commented Jul 20, 2021

At Pexip, we think this would be a useful feature too. Giving the opportunity to save historic preferences is one benefit, but the most significant benefit for us would be to guide towards the appropriate option which supports audio sharing (i.e. go straight to tab capture specifically on Mac).

@emcho
Copy link

emcho commented Jul 21, 2021

This looks good and I find it could prove useful to the Jitsi Meet app suite. Thanks for doing the work!

@bbaldino
Copy link

We'd be interested in this for Webex as well.

@alper-teke
Copy link

alper-teke commented Aug 11, 2021

As Atos we're very much interested in this too

@jan-ivar
Copy link
Member

jan-ivar commented Sep 2, 2021

@alvestrand and @eladalon1983 suggested some UX mitigations this morning that might let us move forward here.

The spec could strongly recommend that user agents:

  1. Remove the requesting tab from the list of available "browser" sources, or hide/warn/discourage picking it.
  2. Remove the requesting tab's window from the list of available "window" sources, or hide/warn/discourage picking it.

This would by no means be a catch-all — same-origin documents may lurk in other tabs and tabs' BFCache — but should preserve the social engineering obstacle to basic click-through active attacks.

Self-capture use cases typically don't want a picker anyway, and will be best served by getViewportMedia https://github.com/w3c/mediacapture-screen-share/issues/155.

@alvestrand
Copy link
Contributor Author

If we follow the advice in 1. - should this apply to just the requesting tab, or to all tabs with the same origin?
Same-origin tabs have the ability to manipulate each other, so a trivial workaround for this restriction would be to open up another tab in which to do the dastardly deeds before calling getDisplayMedia.

@eladalon1983
Copy link
Member

eladalon1983 commented Sep 3, 2021

If we follow the advice in 1. - should this apply to just the requesting tab, or to all tabs with the same origin?
Same-origin tabs have the ability to manipulate each other, so a trivial workaround for this restriction would be to open up another tab in which to do the dastardly deeds before calling getDisplayMedia.

The same workaround could be applied with tabs that only appear to be cross-origin. Namely:

  • evil.com runs in tab1 and opens collaborator.com in a new tab - tab2.
  • collaborator.com embeds a "mailman" iframe with an evil.com document.
  • Technically speaking, these tabs are not same-origin.
  • Practically speaking, collaborator.com, in tab2, can postMessage() to the "mailman" evil.com iframe, which can use a BroadcastChannel to shuttle these messages to the evil.com document in tab1.
  • Any concern we had about evil.com running in both tab1 and tab2 now apply, because collaborator.com does evil.com's bidding.

Because of this, I think the recommendation need not apply to same-origin other tabs.

@youennf
Copy link
Collaborator

youennf commented Sep 3, 2021

It seems valuable to me to provide as precise as possible guidelines.
For instance, it is problematic to have capturing document be the opener of captured document.
Same-origin tabs is also problematic as noted by @alvestrand.

As of iframe communication, origin partitioning should help preventing example.com/evil.com iframe to communicate (through BroadcastChannel, IDB...) to example2.com/evil.com or to evil.com.

@eladalon1983
Copy link
Member

eladalon1983 commented Sep 3, 2021

As of iframe communication, origin partitioning should help preventing example.com/evil.com iframe to communicate (through BroadcastChannel, IDB...) to example2.com/evil.com or to evil.com.

I've made a demo. Please launch these two tabs side by side and wait ~5s:

What you see here is that these two cross-origin tabs can talk to each other. So tabs that are not same-origin may nevertheless collude to produce behavior identical to capturing a same-origin other-tab.

@youennf
Copy link
Collaborator

youennf commented Sep 3, 2021

What you see here is that these two cross-origin tabs can talk to each other

Right, and this is something that Safari prohibits.
I believe this is also being worked on in other browsers, see https://privacycg.github.io/storage-partitioning/

@youennf
Copy link
Collaborator

youennf commented Sep 3, 2021

See whatwg/html#5803 for BroadcastChannel specifically.

@eladalon1983
Copy link
Member

Is Safari planning to prohibit cross-origin tabs talking to each other using a shared server exposing a RESTful API designed to facilitate this communication? Because evil.com and collaborator.com can try that, too.

@dontcallmedom
Copy link
Member

The second possibility would be less painful if we just s/MUST/MAY.

FWIW, if MUST is not realistic, and MAY too weak, this sounds like SHOULD would be a good representation of our intent while recognizing the reality of the world.

@youennf
Copy link
Collaborator

youennf commented Feb 1, 2022

MAY/SHOULD if applied to promise rejection is not great due to potential browser compat.
SHOULD ignore would leave that to UA territory, which is more flexible (though a strange API).

@eladalon1983
Copy link
Member

Question: why are you assuming the solution should be to use the same standard property but with a non standard 'monitor' value?

Because it minimizes the logic that has to be written. Why would Chrome want to code two parallel codepaths that accomplish virtually the same thing?

Using a non standard value might become a compatibility issue as other browsers may start breaking pages by rejecting instead of ignoring.

Is there a good reason for the spec to mandate the the user agent MUST reject? Can't we mandate MAY ignore? After all, the user can always select a surface type other than the ideal one, so ignoring the ideal surface should be reasonable from the POV of all entities (UA, application, user).

SHOULD ignore would leave that to UA territory, which is more flexible (though a strange API).

I think that MAY ignore is less strange than SHOULD ignore, but I'd accept either.

@youennf
Copy link
Collaborator

youennf commented Feb 1, 2022

Because it minimizes the logic that has to be written. Why would Chrome want to code two parallel codepaths that accomplish virtually the same thing?

On Chrome side maybe, probably not much though. Other browsers might have to write code to actually accept but ignore this value.

@eladalon1983
Copy link
Member

I have posed a question to you. ("Is there a good reason for the spec to mandate the the user agent MUST reject? Can't we mandate MAY ignore?") I'd like to remind you of this question.

  • You have insinuated that there are more reasons than purity.
  • I don't think that the few lines of code to ignore 'monitor' are the reason.
  • Is compatibility the reason? I think I have addressed that too...?

I think understanding the objection to "MAY ignore" is the core issue we have left. (You and @jan-ivar might have the issue of constraints left, but I think that's orthogonal.)

@youennf
Copy link
Collaborator

youennf commented Feb 1, 2022

Is there a good reason for the spec to mandate the the user agent MUST reject?

When you define a type in WebIDL as an enumeration, rejection is happening when the value passed to a method is not valid.
To implement the ignore rule with enum + current WebIDL, implementors would be required to add 'screen' as a valid value but ignore it, thus putting burden to those User Agents that have no use of this value.
And putting burden on developers to understand that 'screen' is a valid but ignored value...

What would be nice is if we had an open-ended enumeration: if value is understood, use it and if it is not, ignore it.
Spec would define all values except 'screen', Chrome might understand 'screen' hopefully for a limited time.
This probably requires WebIDL changes, whatwg/webidl#893.

This would also allow to more easily extend the enumeration should we have the desire to do so.

  • You have insinuated that there are more reasons than purity.

Insinuated?
I think specific reasons have been provided.

  • I don't think that the few lines of code to ignore 'monitor' are the reason.
  • Is compatibility the reason? I think I have addressed that too...?

This argument goes both side: some lines of code to ignore monitor by all UAs vs. some lines of code to add a separate property (that would do nothing than override the value to monitor in native code) by one UA.

@eladalon1983
Copy link
Member

eladalon1983 commented Feb 1, 2022

I think specific reasons have been provided.

My apologies for missing those. Please link me to the relevant comment, so that I may re-read it.

When you define a type in WebIDL as an enumeration, rejection is happening when the value passed to a method is not valid.
To implement the ignore rule with enum + current WebIDL, implementors would be required to add 'screen' as a valid value but ignore it, thus putting burden to those User Agents that have no use of this value.

  • If we go with constraints, then this enum value already exists as DisplayCaptureSurfaceType.monitor. Safari already ignores this, so it would literally require 0 lines of code. (But do correct me if I've made a mistake. This has been known to happen.)
  • If we go with non-constraints, yes, UAs will need a minimum of 1 line of code in the IDL file in order to recognize 'monitor'. How many more lines of code would be needed to not do anything?

And putting burden on developers to understand that 'screen' is a valid but ignored value...

I've made an argument as to why that is a non-issue here. It starts with "After all, the user can always select a surface type other than the ideal one..."

@youennf
Copy link
Collaborator

youennf commented Feb 2, 2022

I've made an argument as to why that is a non-issue here. It starts with "After all, the user can always select a surface type other than the ideal one..."

What I was meaning is that, as a web developer, I set 'monitor', which is a valid value as per spec/WebIDL, but the picker is not defaulting to 'monitor', or it does but only in specific Chrome versions. Or, as a web developer, I set 'monitor' and getDisplayMedia throws. This would be surprising. As a web developer, I might have to put some specific UI to help user selects the monitor picker, which may depend on the support of the 'monitor' hint or not.

To try summarising the discussions, there are a few options available.
Do you agree with the assessment? Any additional option I may have missed?

  1. Use a dedicated DOMString property
    Value restricted to application, browser or window in the spec (could be updated to an open enum when WebIDL supports it).

Pros: Easiest to understand. Can be easily extended to additional values (self-tab maybe) in the future. Chrome can extend it to other values, should there be a need.
Cons: No way for web app to identify which hints are actually understood by browser (in particular if we add new values in the future or Chrome adds a proprietary one).

  1. Use a constraint based property, reject if value is 'monitor'

Pros: rejection makes it clear 'monitor' is not supported.
Cons: potential compatibility issue if Chrome starts to accept 'monitor'. Adding additional values to the underlying enum might not make a lot of sense (though self-tab may?).

  1. Use a constraint based property, ignore if value is 'monitor'

Pros: No compatibility issue like for 2 (but web page cannot learn the hint is ignored similarly to 1).
Cons: Surprising effect that 'monitor', even though valid, is ignored. Adding additional values to the underlying enum might not make a lot of sense (though self-tab may?).

@eladalon1983
Copy link
Member

eladalon1983 commented Feb 2, 2022

So that I may answer succinctly and to the point, could you please clarify explicitly what your objections to "MAY ignore" are? Is it that developers might attempt something and it ends up being no-op? Is it something else? Is it one of multiple objections?

I think it's important to understand the answer to the above questions before deep-diving into alternatives. For example, if your chief concern is indeed that developers might incorrectly expect 'monitor' to have an effect, then I'd point out that even supported surface-type hints might end up having no effect, because (i) ideally, the user agent MAY regard the hint or disregard it, and (ii) the user is always free to choose a non-hinted surface type.

But this is really premature, as I am answering an issue which might be marginal in your eyes. I first ask again - what are your chief objections to my proposed solution, where user agents MAY regard or disregard any hint?

@jan-ivar
Copy link
Member

jan-ivar commented Feb 3, 2022

User agents SHOULD steer the user away from monitor capture, regardless of displaySurface value provided as input.

@eladalon1983
Copy link
Member

We are not going to specify "the user agent MUST respect the hint."
We are going to specify "the user agent MAY respect the hint."
So we may as well just leave it at that, and add an explanation of how we encourage the UA to ignore dangerous hints, e.g. monitors, e.g. suspicious sites, etc.

@jan-ivar
Copy link
Member

jan-ivar commented Feb 3, 2022

I see nothing wrong with stating UAs MAY respect the hint, but SHOULD steer users away from monitor capture in spite of that hint.

@jan-ivar
Copy link
Member

jan-ivar commented Feb 3, 2022

That's a narrowing of the MAY, not an expansion of it.

@youennf
Copy link
Collaborator

youennf commented Feb 3, 2022

My preference:

  • spec defines 'tab' and 'window' as supported hint values but no other hint value.
  • API does not reject or throw in case of unknown hint value.
    I do not think there is a need to state 'monitor' selection is dangerous, the spec probably already says that.

@jan-ivar
Copy link
Member

jan-ivar commented Feb 3, 2022

I do not think there is a need to state 'monitor' selection is dangerous, the spec probably already says that.

I didn't say we should say it is "dangerous", which is a characterization. I said we should say what the UA SHOULD do when interpreting the hint, which is a prescription.

@youennf
Copy link
Collaborator

youennf commented Feb 3, 2022

I said we should say what the UA SHOULD do when interpreting the hint, which is a prescription.

This assumes the spec would define hint values that would point user towards monitor capture.
I think we should avoid defining such hint values.

@jan-ivar
Copy link
Member

jan-ivar commented Feb 3, 2022

I'm assuming we use the existing displaysurface constraint. No need to define new surface here IMHO.

@eladalon1983
Copy link
Member

If we use constraints, monitor is already there. If we use something else, is it going to be reusable for anything? If so - it will likely gain monitor at some point.

Also, I don't think anyone else would currently accept an approach other than constraints...? Nobody other than Youenn has been positive on that possibility so far.

@youennf
Copy link
Collaborator

youennf commented Feb 4, 2022

Reusing displaysurface has some drawbacks:

  • The 'monitor' issue
  • A picker might like hints that reduce the sets of surfaces, or allows reordering them. Say: I prefer secured pages, or I prefer self tab, or I prefer the tab that is playing audio right now, or I prefer the non browser application that just opened me and happens to be code-signed by a related party...
  • It creates some minor backward compatibility issues (either reusing directly the enum, or reusing the whole constraint structure) while I think a hint of this sort should never trigger reject.
  • Style speaking (maybe this is just my taste), getDisplayMedia({ audio: true, video: true, prefer: 'window'}) reads better than getDisplayMedia({ audio: true, video: { displaySurface: 'window'} }). Clearer API is better for the web.

Hence why I would go with a new attribute extending DisplayMediaStreamConstraints as an open enum when supported by WebIDL and a USVString in the meantime.
The PR should anyway be small, whatever the option taken. I think this approach might be smaller in fact.

On the other hand, I do not really see what displaysurface gains us. Can you express the benefits of reusing displaysurface?
So far what I could think of is PR size or implementation size. I believe that both approaches will be roughly equally small.

it will likely gain monitor at some point.

This is only a possibility at this point, who knows what future will be.
If that ever happens, we can very easily update the spec accordingly.

@alvestrand
Copy link
Contributor Author

as far as I can tell, again the monitor issue is only a concern if you want to use IDL to enforce a prohibition against preferring the monitor. As Elad has pointed out, this approach is unlikely to gain consensus, given that we don't have consensus that prohibiting "monitor" as preferred display surface is never appropriate. (FWIW - personal opinion - I wouldn't want this on the open web, but it is sometimes appropriate in managed deployment contexts.)

Youenn, I think you are alone in your taste. getDisplayMedia({ audio: true, video: true, prefer: 'window'}) seems to indicate that the preference affects both audio and video, while getDisplayMedia({ audio: true, video: { displaySurface: 'window'} }) indicates clearly that displaySurface affects the video track only.

@eladalon1983
Copy link
Member

eladalon1983 commented Feb 7, 2022

The PR should anyway be small, whatever the option taken.

This looks quite minimal to me:

  The user agent MAY allow the application to use the {{displaySurface}} constraint to
  signal a preference for a specific [=display surface=] type. The user agent MUST still
  offer the user unlimited choice of any [=display surface=], but MAY order the sources
  offered to the user according to this constraint.

Or even shorter:

  While user agents MUST always offer the user an unlimited choice of any [=display surface=],
  user agents MAY change the order or prominence of offered choices in response to an
  application's preference, as indicated by the {{displaySurface}} constraint.

Or shorter still:

  User agents MAY change the order or prominence of offered choices in response to an
  application's preference, as indicated by the {{displaySurface}} constraint.

@jan-ivar
Copy link
Member

jan-ivar commented Feb 8, 2022

  • The 'monitor' issue

We can throw on "monitor" in prose regardless of API (if we decide to), so this seems like a separable discussion.

  • A picker might like hints that reduce the sets of surfaces, or allows reordering them. Say: I prefer secured pages, or I prefer self tab, or I prefer the tab that is playing audio right now, or I prefer the non browser application that just opened me and happens to be code-signed by a related party...

These ideas go further wrt influencing user choice than what I'm comfortable with. They also seem orthogonal to looking at the existing displaySurface constraint passed into getDisplayMedia for a default category to show...

  • It creates some minor backward compatibility issues (either reusing directly the enum, or reusing the whole constraint structure) while I think a hint of this sort should never trigger reject.

I don't see how. displaySurface uses DOMString, not enum, so {video: {displaySurface: "foo"}} cannot reject.

Style speaking (maybe this is just my taste), getDisplayMedia({ audio: true, video: true, prefer: 'window'}) reads better than getDisplayMedia({ audio: true, video: { displaySurface: 'window'} }). Clearer API is better for the web.

I disagree. I'm no fan of constraints, but consistency with getUserMedia and MST wins here in my book.

@eladalon1983
Copy link
Member

eladalon1983 commented Feb 8, 2022

@HTA:

I also agree that reusing constraints is less horrible than adding another way of doing this.

@jan-ivar:

Style speaking (maybe this is just my taste), getDisplayMedia({ audio: true, video: true, prefer: 'window'}) reads better than getDisplayMedia({ audio: true, video: { displaySurface: 'window'} }). Clearer API is better for the web.

I disagree. I'm no fan of constraints, but consistency with getUserMedia and MST wins here in my book.

Constraints are currently used throughout. Deviating from the established course requires stronger consensus IMHO. With Harald and Jan-Ivar in favor of consistency, and me generally agreeing with them, and not a lot of other people participating in the discussion, I think we should proceed under the assumption that we've decided to use constraints to resolve the current issue. That said, if Youenn formulates an independent plan to move us generally away from constraints, I will likely be very interested.

@youennf
Copy link
Collaborator

youennf commented Feb 8, 2022

Sorry in advance for this very long message but there are lots of different points.

As Elad has pointed out, this approach is unlikely to gain consensus

@alvestrand, to be clear, I am not proposing we reject based on WebIDL.
I am proposing we define the property as a plain USVString with some special values.

We can throw on "monitor" in prose regardless of API (if we decide to), so this seems like a separable discussion.

I was referring to the issue that we are exposing a value ('monitor') we actually do not want to expose to the open web (discussion has been about potentially using this value for transition).

These ideas go further wrt influencing user choice than what I'm comfortable with. They also seem orthogonal to looking at the existing displaySurface constraint passed into getDisplayMedia for a default category to show

I think you liked the idea to influence the prompt based on the fact that tabs would be site isolated.
As of orthogonal or not, how would you add a new 'hint' value if reusing displaySurface constraint?
Would you add two properties?
As of 'existing', current browsers do not support displaySurface in existing getDisplayMedia implementations.

I don't see how. displaySurface uses DOMString, not enum, so {video: {displaySurface: "foo"}} cannot reject.

{video: { displaySurface: { exact: "monitor" }} does not reject right now in Chrome/Firefox AFAIK.
It probably also applies to other constraints given to getDisplayMedia like cursor.
My understanding would be that, after the 'constraint' proposal, implementations would start to reject, which is a minor compatibility issue.
I would rather change the spec to match the implementations and explicitly list the constraints that are supported today.
And move away from this MUST-reject-on-exact model when adding new features.

but consistency with getUserMedia and MST wins here in my book.

Consistency with getUserMedia is not making it easy to understand getDisplayMedia.
As an example, let's go to the definition of getDisplayMedia to find out how to call it.
We click on DisplayMediaStreamConstraints, great we stay in the same document and we learn about this structure.
We then want to understand what to put in the video object. We are now going to mediacapture-main spec which defines getUserMedia properties. We learn about 'advanced' (which ultimately cannot be used for getDisplayMedia).
We learn about MediaTrackConstraintSet for which most properties are useless to getDisplayMedia.
And we have to manually go back to mediacapture-screen-share to look at the getDisplayMedia properties and all the specific rules. This is not a great journey.
I'd prefer if we could make life easier than the current state, and describe in the spec what implementations are supporting (video width and height constraints and that is probably it).

Getting back to displaySurface as a constraint, it also makes it possible for web apps to do something like:
getDisplayMedia({video: { displaySurface: ["tab", "window"]}}).
In getUserMedia, this is fine and unambiguous as we are using a distance and only one of the value can have a distance of 0.
In getDisplayMedia case, it is not clear what UA should do there (prefer tab or prefer window).
I would guess the spec would state that the first hint would be used?
With a USVString, there is no need to deal with this. I think a plain USVString is sufficient and simpler.

if Youenn formulates an independent plan to move us generally away from constraints, I will likely be very interested.

There is no plan to move away from constraints, simply to not extend their use over what is implemented in browsers.
Plus a plan to describe more what is supported using WebIDL so that the specification is made clearer.
For instance, DisplayMediaStreamConstraints would be defined without referring to MediaTrackConstraints.
Instead we would list all the properties that are useful in the context of getDisplayMedia calls for audio and video explicitly within two WebIDL definitions (one for audio, one for video).
And we would add the 'picker' hint, either within DisplayMediaStreamConstraints or within the WebIDL dedicated to video.

@jan-ivar
Copy link
Member

I was referring to the issue that we are exposing a value ('monitor') we actually do not want to expose to the open web

Sites shouldn't be able to query track.getSettings().displaySurface == "monitor"?

I think you liked the idea to influence the prompt based on the fact that tabs would be site isolated.

Yes, for user agents to highlight some tabs over others (no spec changes needed for that), whereas here we're discussing an app signal on a default category (it's up to user agents how to apply this signal to UX).

As of orthogonal or not, how would you add a new 'hint' value if reusing displaySurface constraint?
Would you add two properties?

That's hypothetical, as I'd prefer no more hints from apps.

{video: { displaySurface: { exact: "monitor" }} does not reject right now in Chrome/Firefox AFAIK.
It probably also applies to other constraints given to getDisplayMedia like cursor.

Seems irrelevant. exact is categorically a TypeError in getDisplayMedia (for all implemented constraints due to WebIDL), so those are implementation bugs. E.g. {video: { width: { exact: 640 }} fails with TypeError in all browsers = don't use.

My understanding would be that, after the 'constraint' proposal, implementations would start to reject, which is a minor compatibility issue.

Firefox would start rejecting as soon as we fix bug 1732122, so this is unrelated. It's also not really a compatibility issue since it doesn't do anything. By this standard, every new feature is a compatibility issue for people who guessed a future API name expecting it to not be there.

Consistency with getUserMedia is not making it easy to understand getDisplayMedia.
As an example, let's go to the definition of getDisplayMedia to find out how to call it.

Consistency = similarity in API shape and pattern (which users recognize from examples, MDN, or the explainer.md)
Specs = blueprints for implementers, not docs for users. Improving their readability is an editorial issue, which should not influence API design.

getDisplayMedia({video: { displaySurface: ["tab", "window"]}}).
In getUserMedia, this is fine and unambiguous as we are using a distance and only one of the value can have a distance of 0.
In getDisplayMedia case, it is not clear what UA should do there (prefer tab or prefer window).
I would guess the spec would state that the first hint would be used?

I don't think we have to say anything. The app has declared a preference for tab or window. We don't specify UX.

There is no plan to move away from constraints, simply to not extend their use over what is implemented in browsers.

Specs represent consensus, and right now, displaySurface is in the spec. If you feel it should be taken out, please file a separate issue, with new information worthy of reconsideration by the WG.

Plus a plan to describe more what is supported using WebIDL so that the specification is made clearer.

That sounds editorial. I don't think it's reasonable to block API progress on editorial cleanup.

I'd like to get back to debating this on the merits of the proposed change.

@eladalon1983
Copy link
Member

Based on the 2022-03-15, I believe we converged on:

User agents MAY change the order or prominence of offered choices in response
to an application's preference, as indicated by the {{displaySurface}} constraint.
It is advised that allowing applications to nudge users towards sharing a monitor
poses risks to user privacy.

Wdys? @aboba, @jan-ivar, @youennf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet