-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reporting API should be opt-in #168
Comments
I'd like to offer a different perspective on this based on our work on deploying security mechanisms such as Content Security Policy. In practice, it is extremely difficult for web application authors to enable these mechanisms due to significant potential for breaking existing functionality. Without reporting capabilities that allow developers to have some degree of certainty that their application will keep working when a security feature is enabled, rollouts become either very slow (developers need to enable the feature for a small subset of users, wait to see if it results in bug reports, increase rollout percentage, repeat), or -- in the worst case -- they stop being possible for fear of causing major, hard-to-diagnose breakages. We've seen product teams unhappy enough with issues encountered during non-monitored rollouts of CSP to the point of not wanting to continue deploying security features. For a similar reason it also generally isn't sufficient to receive reports from a subset of clients: if there is a group of users for whom the application is breaking, the developer generally needs to know about this or otherwise they can't trust their telemetry. As a result, in the absence of reporting, many large web applications would likely not enable platform security features, leaving users exposed to XSS, XS-leaks, and other endemic web bugs. This would likely set back not just web security, but also privacy because many of the security features serve a dual purpose of preventing websites from revealing sensitive information about the user (e.g. Cross-Origin Opener Policy prevents leaking I feel like having reporting available on an opt-in basis would likely be a net loss for user privacy and security. |
@arturjanc Inspired by a discussion re: the WG's rechartering: can you imagine a way to report this through a privacy-preserving telemetry system like Prio, to isolate the data from potential user identification? |
I'll add a few security-related comments pertinent to the charter review issue, because this issue one like a better fit for a technical discussion (the charter review also has some conversations around W3C process, which I'm not fit to comment on).
Re: the specific question of using something like Prio (or equivalent system like RAPPOR), I'm afraid that this isn't a great fit here, for two reasons. First, these systems are meant to report global data to a trusted party (the browser vendor), which is a different model than what the Reporting API operates under: giving each site information about a specific problematic pattern to allow developers to debug it. Second, the value of reports sent via the Reporting API is that it allows tracking down specific issues in the application; this generally doesn't require revealing more information than is already available to the developer, but it does depend on delivering non-aggregated reports with actionable information. Because of this, I think our focus should be on (2) above, i.e. preventing reports from revealing information that the site couldn't otherwise get, and/or aligning this information with other security/privacy boundaries (e.g. removing some information from reports sent in third-party contexts if that's consistent with browser logic for 3p state). We need to do this anyway, to ensure reporting functionality doesn't result in cross-site information leaks; I'm somewhat doubtful that we can come up with a completely new alternative model here. |
I think there are two topics in the thread:
Before typing even more and wasting folks' times, I want to make sure I have a correct understanding. I'd be very interested in the proposers' thoughts on the following questions:
Thanks! |
Can you elaborate on this? We've tried to design NEL very carefully so this is not the case, as called out in the spec's privacy section:
If there's information in a NEL report that isn't already visible to the server while processing a successful request, then that's an bug in the NEL spec that we want to fix. |
@dcreager Sure! Here are some examples that come to mind (if any of these reflect a misunderstanding on my part, apologies!). But here are some examples that come to mind where NEL leaks new privacy-relevant information to pages: Case 1:
Case 2: Case 3: Happy to elaborate more, but will pause to make sure i'm not operating from some very wrong understanding! |
There is a fair amount of information to support this in the form of technical posts by large application developers discussing their process of adopting web security mechanisms. For example, in the case of Content Security Policy:
I chose CSP because it's the best-known feature which has reporting, but the same is true for other security mechanisms. See, for example, recent feedback from Facebook security folks on the Cross-Origin Opener Policy reporting API intent to ship: We've been experimenting with this feature already on facebook.com and instragram.com and the reporting is incredibly useful feature for us as it allows us to reliably roll out COOP at scale. Since other browsers haven't offered similar functionality yet, it's essentially the only way we can test the impact of COOP enforcement without breaking our sites. Finally, I'm not sure if a data point from me counts for much, but all of the recent deployments of web security features at Google (specifically, the ones described in this post: CSP, Trusted Types and COOP) rely heavily on reporting data.
I have to say I take issue with framing this as an "argument against consent" because this implies that, by default, the use of run-of-the-mill web APIs should be subject to user consent, and there is something odd about shipping web features without requiring it. This doesn't seem to match the model under which the web operates. Consider features such as the Instead, I'd phrase it as an "argument for reporting" which is based on the following:
This is the bar we apply to most other features, so I think it's reasonable to also apply it to the Reporting API. I'll go even further and say that because of the security improvements that reporting enables, we should be open to providing developers more reporting data to help them secure their sites, as long we can do so safely.
I don't think I can do justice to this question here because there is just an extremely large number of use cases for this kind of telemetry. The simplest way I can put this is that a web application is code written by the developer that runs in an environment controlled by the user, and to understand and debug issues which the user encounters the developer needs to get a glimpse into the real, non-simulated operation of their application as experienced by the user. This includes information about the network conditions (perhaps some resource loads are failing or slow for a particular segment of users), local browser configuration (users have extensions that can interfere with the operation of a website and trigger security violations), the specific functionality that the user was interacting which triggered the report (crawling modern web applications reliably is an unsolved problem), transient server failures that you can't test for, etc. These are crucial issues for complex web applications which you can't identify via static analysis or in a staging environment. Or, maybe put another way, developers wouldn't be so vocal about the importance of browser-based reporting if they could just crawl their sites instead :) |
One additional aspect that I wanted to (separately) comment on is the following:
I see this as a bit of a false dichotomy because it assumes that websites gather telemetry for their own gain, which doesn't translate into a benefit for the user. In practice, pretty much all of the telemetry is used by websites in order to realize some clear benefits for the user: enable security features, improve reliability or performance, identify breakages, etc. Both the website and the user share the goal of having the website run reliably and quickly in the user's browser and handle their data securely, and reporting is a means to that end. Continuing the analogy from above, the user doesn't directly benefit from the Fetch API being available in the web platform -- the user literally does not care :) But Fetch helps developers write applications more easily and to avoid security issues that they'd invariably run into if they built custom workarounds for the lack of this functionality in the platform. I think it's important to keep this in mind, because the Reporting API is just one in a long line of web platform features that work this way. |
Might be worth moving this part of the discussion to a new issue in w3c/network-error-logging so we don't clutter things, but some short responses here:
These both seem like examples of what we've tried to cover in this part of the (now renamed) Network Reporting spec. Chrome's current behavior follows the suggestion — policies and reports are both cleared whenever the user agent detects that the network configuration has changed. I would be on board with promoting that part of the spec text to a requirement. A wrinkle, though, is that it depends on the user agent being able to detect the network configuration change. If it happens completely off to the side, or upstream, then it will be harder to mitigate.
If I'm following right, then you can only see the "from this other page" part in a NEL report's (This is a common confusion about NEL. If a page at |
In particular, for Case 1, where the user turns on a VPN and starts using a DNS resolver that blackholes
|
Thanks @dcreager this is very helpful!
Happy to do so if you'd prefer :)
That sounds terrific!
We may be agreeing, but I think this is a difficult, significant constraint. There are all sorts of things that not quite full network config changes (that'd result in an OS or browser triggered signal, like changing a network adapter, hotspot, etc), but will result in increasing or decreasing network level privacy defenses. Extensions are just one of these. Depending on how the browser is checking, adding or removing entires to /etc/hosts is another. Changing settings on privacy preserving middleware (pi hole) a third. There are many such cases.
I am very happy to accept responsibility of the confusion 😅 But I think i didn't quite make the claim well. Let me express the concern more specifically, and if im still off in the wrong direction, i'd appreciate the clarification!
In the absence of NEL, Facebook wouldn't have known i visited the chicagotribune.com Is this incorrect? I tried to re-read the updated / renamed version you pointed to but maybe its under dev still; i didn't see the list of NEL / Network Reporting types I think i remember. Apologies if I'm missing them, or looking at the wrong version of the proposal. |
That document is certainly still under development -- the idea is that it is an extension to the reporting spec that allows things like NEL to be built on top of it. It contains all of the things that we pulled out of the base Reporting API, like endpoint-groups with failover, out-of-band configuration, and the cache of reports which can outlive individual documents. NEL is still at https://w3c.github.io/network-error-logging/ and hasn't been updated to use that as underlying infrastructure yet. It's on my list of tasks to take care of in the new year. |
@clelland okie dokie, thanks for the update! It looks like NEL hasn't changed since i reviewed it last, so i think my example above would still apply, though if I am wrong, i'd be grateful for the correction. |
@pes10k, So yes, in that case, with the spec as written, it might be possible for Facebook to learn something, but it depends on other factors, I think. The scenario you're presenting reads to me like this:
I'm sure that even without NEL, the conflict between the first two points sets up an arms race, but we should do what we can to make sure that NEL doesn't tip the scales there. You're suggesting that if the browser sees the intention to load the tracking resource, but it doesn't happen, then NEL would cause the browser to report to facebook that something was wrong; that in that case, Facebook would be using NEL to circumvent the ad blocker/anti-tracking tech. That's not necessarily true, especially once we start talking about ad blockers, and other tech that exists outside of the realm of web platform standards. There's a lot resting on the "whatever approach" in your step 3 — I think that the results are very different depending on how exactly the tracker is blocked; without talking about that, it's hard to say what the results could be, or what additional protections might be needed. I can imagine at least these different scenarios:
Obviously those are approaching the ridiculous by the end, but there are clearly a large number of points at which the requests can be blocked. In some cases, requests are legitimately failing due to infrastructure issues, and ought to be reported. In other cases, the browser can know that the request was never intended to be sent, and should have no reason to report a network error when it isn't. I would expect that any requests which the browser can determine are unwanted would not ever be seen by NEL, and that anything further out into infrastructure may be indistinguishable from network damage, from the point of view of either the browser or Facebook. I don't see any reason why an extension which was working with the browser to block unwanted content wouldn't be able to make it so that the requests just "didn't happen" as far as NEL is concerned. (All of this may be out of scope for the spec, though, similar to #223) Outside of extensions, a more general approach is to treat the NEL configuration and reports the same way that we do other third-party subresources, and isolate them appropriately, which could make the entire scenario much less likely. Chrome is intending on using the network partition key from Fetch to isolate NEL configurations from each other, so that the NEL policy picked up in step 1 wouldn't apply to the request in step 3. (A different NEL policy could apply, but if the requests are always blocked in that scenario, then one would never be installed). Also, I expect that blocking or similarly isolating third-party cookies also makes the tracker less effective, as no useful credentials would be sent in the NEL report in any case. (The cookie policy for the NEL report should be the same as for the resource itself, I believe). I don't know what the standardization track is for those efforts, but if they're likely to be effective, we should at least mention them somewhere. |
@arturjanc Whether or not increasing a user's fingerprint (potentially crossing the uniquely-identifiable threshold) is "worth it" is something for the user to decide, not a webmaster. Studies need the consent of all subjects involved, even if researchers believe that it's in the subjects' best interests. Users can make informed consent after being informed of the scope of telemetry, how it will be used, and how it will be shared. A user (like me) who visits a website one time probably doesn't care if the website "improves their experience" if they don't intend to re-visit it. They probably wouldn't consider "collect and share information about your setup, in exchange for a better site in the future" a fair trade. From the perspective of a one-time user, the Reporting API serves only to fingerprint. POSSE note from https://seirdy.one/notes/2022/09/04/reporting-api-and-informed-consent/ |
@Seirdy - I think it's worthwhile distinguishing here between the information that is exposed to the web site (through the Reporting API, or through other means) and the delivery mechanism for this information. I would claim that how a certain piece of information reaches a website doesn't change the way in which this information can be used and abused. So if we want to limit information about the user's setup, that's great but we should do that at the exposure point. Adding extra friction to the delivery mechanism (that is, the Reporting API in this case) will do nothing but cause sites to choose other delivery mechanisms (e.g. get that info from a JS API and upload it with |
The Reporting API is distinct from most core, existing browser functionality in that it principally benefits the site operator. It is asking web users to help the website identify errors and problems in the site owner's application (e.g. to have users serve as debugging and monitoring agents for the site owners).
This is useful to site owners, who will can offload monitoring costs to users and be notified of conditions the site owner might not anticipate. It may be useful for web users, as a group (the indirect upside of bugs and attacks being identified sooner; the downsides of bearing the burden of monitoring the site on behalf of the site, and possible privacy concerns). It is very unlikely to be useful, at the margin, for any single web user.
The Reporting API therefor should be treated as a benefit the client provides to the website; it should require explicit opting in on the part of the client, globally, and with per-origin exceptions, though a permissions like system.
The text was updated successfully, but these errors were encountered: