-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in spec: circular dependency for enumerateDevices() #709
Comments
From perspective here the order of operation should be Something like For backward compatibility users should be able to get a devices directly with something like
So, there are means to solve these device select issues, even within the scope of not breaking the web and existing users of `getUserMedia() - as long as the changes are consistent. Else, do not overload methods with functionality not originally intended. Create new methods to meet the requirements. |
This is a lot of input, thanks @hills! Diving specifically about the NotReadableError case, I can see two possibilities:
I tend to prefer option 1. It does not seem great that a web page would have to iterate through all devices if getUserMedia({ video : true }) returns ReadableError. It should really be the responsibility of the User Agent to try as much as possible to fulfill the page request. |
@youennf problem here was that the web page could not iterate through all devices since enumerateDevices only returned a single pair of devices. Which probably was broken since Chrome changed to that model for the case where there was no permission initially but it was rare enough nobody complained... Option (1) would cover this as well. |
The spec says something like "once you have the set of devices that satisfy the criteria, the UA picks one". |
I agree, there is a short term vs. long term proposals here. I just wasn't sure that posting 'ideas' in ticket form is good etiquette, but I am very happy to do that if preferred.
I don't think the two are mutually exclusive? Frankly, the moment the user grants permission, that should be it (and I believe that would 'solve' the present ticket). That action should relate to device exposure on its own. It's very strong as it is, because of explicit action by the user. I'm afraid I don't understand how waiting for the device to open successfully strengthens privacy. I think it's actually the opposite effect; the user's understanding is complicated and weakened if it's coupled to some other future action, rather than a simple 1:1 relationship with them clicking 'allow'. Now for the other issue. Can I clarify, you would propose to ignore 'exact' deviceID constraint if the device can't be opened? Because if I say it like that it surely can't be justified! Even if it's not an 'exact' constraint or not, device selection logic is going to perform unusually for users in some example cases:
I think it's a noble aim to feel the API is simpler if deviceID as just another constraint. But reality some of the constraints tell us something about the source itself (ie. which way the camera/mic is pointing) vs. how we wish to capture it. And so in my long term proposal (part 2) attempts to embody this cases without suprises for users or developers. |
Re long-term, at the front-end The user does not gain any knowledge about the device being captured other than "microphone". The order should be Attempting to massage clarity from As you pointed out, attempting to select a device the implementation refuses to support capture of leads to a
To avoid such restrictions implementers might decide to arbtririly incorporate into their version of Ideally, there should not be any restrictions at all on which devices can be selected for capture, whether the device identifier is The change would break existing web applications, though since this specification is active and users of |
Mozilla browsers do not have the same issue as Chromium, Chrome browsers, as Nightly and Firefox provide a drop-down list of devices available for capture at the UI prompt, including monitor devices. However, that still requires calling Thus, if the specification is adjusted to This can be accomplished by using a very basic
which is simple enough to be specified and uniformly implemented. Flow-chart:
The above algorithm should resolve any ambiguities as to which devices are selected by user and which devices are exposed for the session. |
The prompt for Mozilla where in code then filters the list from
|
Hi! I don't think this should have been closed. The patch doesn't address the core issue, especially as this text remains:
Which IMO should be changed to what it was previously:
with whatever changes to the spec to create this summary. The merge patch seems to have little or no implication for the issue raised, and no explaination of how it aims to fix the issue. So please can the issue be re-opened. |
As discussed before, your initial post touched on several points and we decided to focus this issue on the specific problem of device failing. For other points, it might be best to file new focused issues. What this PR is doing is making sure that the browser will try to fulfil as much as possible the user request. If all devices fail, getUserMedia will fail and no information on the device setup will be given to the page. Can you clarify which scenario you think is not covered after this change? |
Which of these happens:
|
With the above request, only one device can be used by getUserMedia. |
And in these errors, no access to enumerateDevices will be available? So how do I gain access to enumerateDevices? |
Yes
Rewrite your request to: getUserMedia(deviceId: 'xyzxyzxyzxyz') |
So now I make that calll, I have no idea whether the device that was opened was the one the user requested. So here's the minimum practical code to open a user's previous device and present a device selection:
|
Not really, in step 4, you can check capture is using the device with the given id through the provided MediaStreamTrack. If not, you can check enumerateDevices to know whether the device is there. Although the spec does not require it, I think getUserMedia({ video : { deviceId: 'xyzxyzxyzxyz' } }) browser current implementations will always pick the device with the corresponding id, if the device is there and functional. In browsers that have pickers like Firefox, it might be actually better to stick with what the user selected (or ask the user if they would prefer to use the past device explicitly). |
I don't think its a good sign that a problem is introduced and now the solution is to use more API :)
"Should" and "always" are not compatible here :) With the above clarification, developers will only act defensively, and are reduced to:
And this still provides a poor user experience requiring restarts if the primary device is unavailable. And it will be especially poor if any browser is prompting per-call to getUserMedia. |
I was probably not clear. I do not think that you need to call enumerateDevices at all. If that is not working for you, this might be a browser bug or a scenario we have not thought about. |
Subsequent calls to
where the corresponding As suggested, the user needs to be able to select devices before For that reason the order needs to be |
@hills, can we close that issue? |
As @hills said, I think developers would likely just add a An example scenario (simplified from a real case):
What's the expected flow for the above scenario? |
I am unclear with some details in your scenario, like whether only one webcam is connected (720p) and web page wants user to connect the 1080p camera. Here is a potential flow: const previousDeviceId = await getDeviceIdFromIDB();
let stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : previousDeviceId, width : 1920 } });
// Browser will try using the previous device, if not possible, it will try selecting any 1080p camera.
if (stream.getVideoTracks()[0].getSettings().width < 1920) {
// Chances are high there is no 1080p camera otherwise it would have been selected in the first place. Let's still check just in case.
const devices = await navigator.mediaDevices.enumerateDevices();
const newDeviceId = select1080pCamera(devices);
if (!deviceId) {
// Ask user to connect a 1080p camera through some UI.
....
navigator.mediaDevices.ondevicechange = trySelecting1080pCamera;
return;
}
// Optional step: switch immediately to the 1080p camera. It might be bad if the user selected the other camera explicitly through a device picker (say Firefox picker).
stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : newDeviceId, width : 1920 } });
}
// Proceed with using the stream
... Another approach: try {
const stream = await navigator.mediaDevices.getUserMedia({ video : { deviceId : { exact : await getDeviceIdFromIDB() } } });
stream.getVideoTracks()[0].applyConstraints({ width : 1920 });
return stream;
} catch (e) {
return navigator.mediaDevices.getUserMedia({ video : { width : 1920 } })
} |
For me, I'm afraid not, because I don't see the solution as workable. To me the best solution is the obvious and previous one: once a user has granted permission, enumerateDevices() is allowed. Perhaps I would be more amenable if you could explain why we are not just doing this? We are only starting to explore the negatives of requiring a device to be successfully opened first; it sounds like you are hoping this adds some value in some way and working hard to retain this. Can you explain what value that is? |
Can you point to a website or a jsfiddle that is broken with this change and that we would not be able to rewrite without some big refactoring and/or different user UI?
enumerateDevices is widely abused by trackers on the web for several reasons:
As part of privacy enhancement, it was decided to limit leaking to the minimum by default. For instance, it would be easy for a website to ask camera access once to take a picture or as part of a game.
I understand this is a change of behavior and that websites might want to update to optimise their flow. I actually think this change is bringing improvements outside privacy improvements. Before the change, a website would have to handle the case of new users, or users that did capture but revoked permissions, or users that did capture but cleared web site data including IDB. A website would also have to handle multiple browsers with various permission models and prompts, leading to different enumerateDevices results. The proposed change and proposed usage of getUserMedia is also future proof with the in-chrome device picker for getUserMedia. |
You've explaned why device IDs may be used for tracking, or why specs help to unify behaviour; but that was not in question. What is the value of the additional requirement to successfully open a device? |
Capture indicators are usually tied to successfully opening the device. The reverse question is also interesting: what is the practical value for not adding this requirement? |
But all of this happens after the user has already "allowed" access to their media device(s) explicitly. Are proposing that enumerateDevices() can only be called whilst a device is streaming? |
Explicitly or implicitly.
Spec allows a user agent to do so but this is not mandatory. I believe it would be too strong if implemented as is. |
I agree it would be too strong, with wide reaching breakage. You state that a user agent could allow this, but that is not how it reads. In the "access control model" the most recent wording is:
and this meaning persists in the historical version too. Therefore there can be no correlation between capture indicators and enumerateDevices(). |
Forgive me for pressing the issue; if I may summarise: The upsides of the change:
The downsides:
Thank you for being patient with my questions. But I feel this is quite a robust case for the previous behaviour. |
Which change are you referring to?
There is a strong correlation: a page will get full enumerateDevices access at a time where capture indicators will be visible.
Are you specifically referring to #717 or to enumerateDevices sanitization? If it is the latter, I agree this is an important change and I am more than happy to discuss how to best migrate.
Are you specifically referring to #717 or to enumerateDevices sanitization? Would you be able to file an issue specifically for that? Guessing it is about enumerateDevices sanitization, the example should have the following requirements:
|
"The change" which I refer is the subject of this bug report; the very first paragraphs. Pull request #717 is inconsequential to the summary as it merely complicates the state of affairs. For the avoidance of doubt, here's the same summary with context: This new condition in enumerateDevices (summarised in commit c15a432, March 2020):
Previously this read:
The upsides of the change:
The downsides:
This is a robust case for the previous behaviour. |
No. A behavior change is always painful and it is much easier to adapt to the change if you know why it was done.
I do not know how the discussion shows this.
True. As I said above, I am more than happy to help mitigating the pain of migrating to the new behavior.
This is a subjective statement.
I haven't seen any proof the new behavior is less efficient. |
The issue here is that if a wrong camera is opened, or a wrong setting is used, re-open/update it is very costly. It's strongly desired to open the correct camera with correct settings with one shot, since
For example, if the app want to open "Cam A with setting X", or "Cam B with setting Y", depends on which camera is available now. There is an always available "Cam C which supports both setting X and Y", but the app/user don't want to use it for some reason (such as a built-in USB camera on a laptop with quality or wrong facing).
Note that |
Opening a wrong camera is slow and painful.
exact deviceId contraints should prevent that (both prompt and LED).
Let's say page sets A and B as exact deviceId constraints:
Would that work for your use case? |
It's a regression in the conversation to again refer to the benefits of not exposing device IDs for fingerprinting. That is not what is in question here; nobody has, or is, questioning that. We must be more precise of the exact benefit if we are to further this discussion; because it's not possible to respond to "privacy enhancements" by "privacy experts". I previously asked if you could clarify the additional privacy of requiring a web page to successfully open some device (an additional event which happens after permission is given by the user). You centred on this being that a capture indicator would be present in the browser. But subsequently we both agreed a web page could do some capture and then use enumerateDevices() after. So the possibilty of a web page doing enumerateDevices() without a capture indicator is always there. This is what I meant by "benefits cannot be realised in practice". (And, elsewhere the 'workaround' is exactly that: a generic getUserMedia({audio:true}) and then close the stream) I am trying to advance the conversation by demonstrating (for now) that, given these steps:
there is no upgrade in privacy happens after the completion of step 2. Waiting for step 3, which the spec forces, has no privacy benefit. But you are saying there is? Can you clarify exactly what the specific benefit is? |
Yes. Not so much my final request, just trying to tangibly demonstrate the core issue.
Perhaps, but I didn't put it like that as it implies user input, and step 2 may not involve any. But if you're clear on that then it's fine :) |
This already occurs in practice at Chromium, deliberately, ostensibly for the claim that PulseAudio selects a monitor device when no input device is found; see https://bugs.chromium.org/p/chromium/issues/detail?id=931749#c6, https://chromium.googlesource.com/chromium/src/+/4519c32f528e079f25cb2afc594ecf625f943782
where if there are no available inputs am not sure what the user is expecting other than a monitor device to be captured in that case when no microphone is connected to the machine. Chromium simply refuses to recognize monitor devices, captures microphone. Of course, if the user has no input devices, or connection to the device fails, that commit does not change anything - the user still has no microphone connected to the device - so the rationale for the change is at best unclear. Relevant to this topic that change means Further, if a user, in kind, refuses to accept that monitor devices will not be captured, and creates workarounds to actually capture the monitor device, since Chromium implementation refuses to list monitor devices, Capture of monitor devices throws an error when set to default at OS. Users should not be limited by specification authors initial use case conceptions. Video conferencing is not now the sole use case for |
Let's concentrate solely on this point in this issue.
Let's say a device is always broken and page somehow knows it. With the change, the web page will not have access to that info and will have to open a functional device, which will trigger the capture indicator. This makes it highly unlikely that pages that want that information for learning about user (but not call getUserMedia) will actually take the risk to be discovered. The benefits are realised in practice.
I do not see how these downsides are related to the case you mention above (capture fails due to a hardware issue). |
In any sane system this case does not exist, because returning failure (or success) must take place after the permissions check. Leaving the user well aware; because they either confirmed then and there, or specifically asked the browser to remember this choice. So, before I continue, it seems you are clarifying that the failure case you are concerned about happens before permissions check? |
No, the failure happens after the permission check. To be as accurate as possible: |
But if the permissions check does not trigger a prompt, that is because the user has already declared that the action is allowed. Are you trying to protect the user, even in cases where they have given their permission? |
Yes, this is explained at #709 (comment).
The spec change forbids this scenario and makes enumerateDevices much less useful for trackers. |
Ok. So we have asserted that the failure is after the permissions check. And the actual privacy in the above hinges on the assumption that trackers "would not risk a prompt" (quoted from #697) to try for permissions. So with that in mind, it is the context for my previous question, which went unanswered but is especially relevant now:
Step 2 is the "prompt" in the quote above. It is passing of step 2 which is meaningful -- tracker (or genuine web page) took risk on a prompt; user accepted it. What, if any, increased privacy happens by completing step 3? |
Either prompt or capture indicator.
Example provided in #709 (comment) identifies some benefits (basically capture indicator will prevent trackers to try this approach). |
No, not the capture indicator. Any software (malicious or otherwise) can call getUserMedia() and immediately close the stream == no capture indicator. It does not provide a counterpoint here. And let's not forget, the precondition to all of this is that the site took the risk on the privacy prompt (making it is out of scope of the chosen definition of a "tracker"); and furthermore the user granted that permission. The completion of the permissions check is the point at which enumerateDevices() should be allowed (with no adverse consequences), and if we can accept that then perhaps then...
... the benefits of doing so will become relevant. But for now I agree to focus one one point at a time. |
No.
No. |
Ok, my apologies, we have some crossed-wires on the words "capture indicator". I was referring to, in Chromium 87, the red circle which appears on the tab during capture. Firefox has its analogue of a microphone icon to the left of the URL bar. Both disappear when capture completes. You are referring to the grey camera icon inset in the URL bar, which appears at some point and, in Chromium 87 at least, persists. I'll assume your definition is the agreed one (I guess I'll just say "red blob" for the other). But perhaps it's now clearer why I would be so assertive about this indicator. I'm out of time for today, there are further points I think should be made but that will have to be later. |
Thanks for raising that issue.
I think both definitions are good. |
The tail is wagging the dog here? Access to APIs is being restricted to when the capture indicator is on screen. Better to have a clear API design, and derive the capture indicator from it. The side effects are being tested in this ticket: apprehension around failure cases in case they reveal information without indicating; and a 'guaranteed to succeed' codepath is a burden for both the spec and developers, but is needed to satisfy the problem which opened this ticket. Your goals are increasingly clearer, why not just implement those goals?
To achieve the above:
With this, no complexity or change is pushed on the developer compared to their current experience in eg. Chromium. When returning to a web page, the intial call can be to either enumerateDevices() or getUserMedia(deviceID: xxx) and get the intuitive result (no need to quash failure cases, ignore 'strict' requirements, or restrict acccess) But, crucially, all of the goals of the capture indicator are achieved as well (and clearly defined) And the other benefits are:
|
Let's get back to the initial request:
Are we good now in the fact that this has benefits but no identified drawbacks?
I just illustrated some of the benefits, another major benefit is consistency between browsers.
This is specific to Chrome and not specified in any spec. |
I'm having trouble parsing this, and other parts of the message. sorry. As it sounds as if you are asking if I agree with my own (older) point which, of course, I do. But since then we discussed your privacy concerns, and incorporated them, so I think it is helpful not to go backwards. This is the current concern: (full context). Can this be implemented?
This outlines an, overall, much better fix to this ticket than that which was merged; and better direction in general. It has benefits, and no identifiable drawbacks, as you say. You say the capture indicator ("camera" icon, typically) falls outside of the spec, that is even better. The capture indicator can be oriented to achieve the desired privacy goals; the spec focuses on maintaining a clear API without quirks (it also happens to be in line with the historical API so does not break existing code) |
@hills You're describing a solution here not a concern (concerns cannot be implemented). We need to start with problem-statements, not solutions, but to save time, putting From what I can tell from a read-through, all concerns with the current model that have been backed up by examples have been addressed with #717 and #724 (Firefox bug here) and an explanation from @youennf that I'm going to close this thread as it has gotten too long. It'd be more productive to open new issues on specific unresolved items. To summarize the broader issue for people who land here: The WG consensus is that the enumerate-first strategy wrt device discovery is no longer feasible in the current privacy climate. While the spec previously implicitly supported this, it no longer does. What remains is the device-first model that most sites already follow. e.g.:
Long term we hope to move away from |
If the default device fails to open (even with permissions) then it has now become impossible to use any other device.
This is because of this new condition in enumerateDevices (summarised in commit c15a432, March 2020):
Previously this read:
Here's how this plays out in practice:
Chromium attempted to follow the new spec but reverted the change.
In practice there are reasons a device may not be able to be opened, such as exclusive use by another application, or cannot fulfil some criteria, or just a fault. These may be platform or hardware dependent.
It looks like the summary in the commit is based on 9.2.2 "Device information exposure" which has been adjusted in commit e159c60, also in March.
I am not a spec author, I am afraid, and I would need time to fully understand the detailed steps described in the spec. But if I may suggest that it seems like the spec embodies a lot of policy that means existing special cases are causing new ones.
A proposal for what the user or developer experience should be that would make a lot of this simpler, whilst avoiding fingerprinting/probing issues:
Calls to getUserMedia that do not specify a device ID (or specify "default") would be governed by a "permission to use your camera/microphone" dialogue provided by the browser:
And then, independently a permissions flag (looks like [[canExposeDeviceInfo]]?):
What my goals are in the above proposal:
It is good to remember that not all apps are standard video conferencing apps, and increasingly there are WebAudio apps for producivity will use multiple devices concurrently.
The text was updated successfully, but these errors were encountered: