Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New principle: Deployability and Monitoring #368

Closed
yoavweiss opened this issue May 10, 2022 · 10 comments
Closed

New principle: Deployability and Monitoring #368

yoavweiss opened this issue May 10, 2022 · 10 comments
Assignees

Comments

@yoavweiss
Copy link

whatwg/html#6933 was filed by @pes10k against the integration of HTML with the Reporting API, an API which goal is to ensure monitoring and deployability of other web platform features or of the serving of web applications themselves. The issue claims that these use cases are somehow "inappropriate".

I think it's important to outline as a principle that monitoring and reporting (on the web application's performance, use of deprecated APIs, or its use of new security-related restrictions in "report only" mode, to name a few examples) are an essential part of being able to deploy web applications at scale.

@pes10k
Copy link

pes10k commented May 11, 2022

Can you say more about the principal you're proposing @yaovweiss? I can't quite picture what that principal would look like. What lines are you considering drawing. Whats the backstop / limit?

I want to re-emphaisze that these discussions keep conflating two different issues. I've never suggest that was inappropriate (w/ or w/o scare quotes) for sites are able to monitor sites with their own resources and devices, or with devices they've received consent to use, or rented from a monitoring services, etc. A more advanced isitdown that took the measurements you're concerned with sounds great. A "Google Site Monitor" service that monitored sites from a variety of IPs and browsers, running from google servers, sounds like a wonderful thing. It could even use the wonderful puppeteer tool that Google maintains.

What I do think is inappropriate, is to standardize browser functionality specifically designed for sites to collect user data about users' environments, network conditions, browser configurations etc, for purposes that don't directly benefit users, and without asking consent. I think its doubly inappropriate because part of the justification thats been expressed for not asking consent is "we don't think enough users would grant consent".

In short, it's all good to improve the ability of sites to monitor how their sites perform; but it doesn't follow (and isn't the case) its its therefor appropriate for sites to conscript unknowing, unconsenting users' for that purpose. Just because there are resources on users' machines that could benefit sites, doesn't mean those resources sites' for the taking.

@yoavweiss
Copy link
Author

I want to re-emphaisze that these discussions keep conflating two different issues. I've never suggest that was inappropriate (w/ or w/o scare quotes) for sites are able to monitor sites with their own resources and devices, or with devices they've received consent to use, or rented from a monitoring services, etc. A more advanced isitdown that took the measurements you're concerned with sounds great. A "Google Site Monitor" service that monitored sites from a variety of IPs and browsers, running from google servers, sounds like a wonderful thing. It could even use the wonderful puppeteer tool that Google maintains.

Unfortunately, lab based services such as the one you're proposing are not sufficient in order for sites to e.g. confidently know that:

  • They don't have unexpected performance or reliability issues in some parts of the world that are not covered by the lab service.
  • That they don't have unexpected issues they are hitting in some user scenario or on some user devices.
  • That their roll-out of security features is not hitting some unexpected cases, e.g. when interacting with some 3P providers that manifest themselves only in some parts of the world.

While lab data has a lot of advantages, it's not sufficient on its own.

What I do think is inappropriate, is to standardize browser functionality specifically designed for sites to collect user data ... for purposes that don't directly benefit users

I think this is the crux of our disagreement, so let's tackle it head on:

  • What's "user data"? Is the fact that the page load failed because the page's 3P provider didn't set their CORP headers appropriately user data? Is it the fact that the page took a long time to reach LCP? I would argue that those are not user data, while you seem to think that it is.
  • "Don't directly benefit users" - again, I think there's some disagreement on the definition of what benefits users vs. not. I agree that typically users are not visiting a site in order to provide it with debugging services, but automatic reporting of bad user experiences seem to benefits the site's users in general in the longer-term, as well as the particular user reporting it, in case they are a regular visitor to the site.

I think its doubly inappropriate because part of the justification thats been expressed for not asking consent is "we don't think enough users would grant consent".

Any link to that? The justifications I heard are mostly around "there's no need to ask for user consent in order to perform a fetch() equivalent". Dedicated reporting APIs are not different from fetch() other than ergonomics and potential reliability.
Sites can already fetch content into the user's browser and post content from it. I'm not sure what meaningful consent you're expecting to get from users, and how you think user agents can enforce such permission without blocking every HTTP request behind a permission prompt.

@yoavweiss
Copy link
Author

Friendly ping! :)

@plinss plinss added this to the 2022-06-06-week milestone May 18, 2022
@pes10k
Copy link

pes10k commented May 19, 2022

Apologies for the delay. I was waiting until the privacy-principals group had its next meeting to respond. That group is discussing many very similar issues (such as w3ctag/privacy-principles#162). @jyasskin is part of those conversations, and I hope he'll correct me if im wrong, but I think its accurate to say that there are folks other than myself who share (or are at least working through) the same concerns expressed here; that consent, opt-in, etc is an import part of user agents making this kind of information available to sites.

Its very not settled, and I'm not at all trying to say "that group said X, so let's close this issue." I'm just trying to avoid splitting threads too much. I think it'd be good if we could pause this thread until the privacy-principals conversation is resolved, and to continue this conversation there (either with you directly, or with the Google members of that group expressing your concerns; though i think @jyasskin is already doing so :) ). How does that sound?

I think its doubly inappropriate because part of the justification thats been expressed for not asking consent is "we don't think enough users would grant consent".

Any link to that?

I was thinking of WICG/crash-reporting#1 (comment). I understood "significantly hurts the quality of the data" to mean "less data", i.e., many folks wouldn't consent if we asked them.

What's "user data"?… I would argue that those are not user data, while you seem to think that it is.

I'm calling this user data because its generated by users, and (directly or otherwise) describes a user's environment, experience, capabilities, choice of software, etc.

there's no need to ask for user consent in order to perform a fetch() equivalent

I don't think it is accurate to generalize the Reporting API as "fetch() with better ergonomics and reliability".

If this is just about developer ergonomics, then I suggest ya'll create a Reporting-API JS library that provides the already existing data with a nicer API to use; a jQuery for analytics. In other words, if this is just about exposing already available data in a nicer way, then a nicer JS API over existing capabilities would seem to solve that concern, we can delete the Reporting API and call it a day ;)

But i'm pretty sure thats not the case ;) I think the Reporting API (and the related report types) intends to exposing new qualities and new quantities of data to sites. I think users and browsers currently don't provide this data to sites (either at all, or in the amount that Reporting-API-supporters expect the Reporting API would make available) and the goal of the Reporting API is to change that.

And so, if an API is going to:

  1. make new types and amounts of data available to sites, and
  2. users could reasonably not expect that data would be available to sites, and
  3. users aren't interacting with the site with the goal of sharing this data to the site

then the right thing to do is either a) continue not providing that data to sites, or b) ask users if they'd like to share the new kinds and types of user data with sites.

@torgo torgo self-assigned this May 19, 2022
@yoavweiss
Copy link
Author

Apologies for the delay. I was waiting until the privacy-principals group had its next meeting to respond. That group is discussing many very similar issues (such as w3ctag/privacy-principles#162). @jyasskin is part of those conversations, and I hope he'll correct me if im wrong, but I think its accurate to say that there are folks other than myself who share (or are at least working through) the same concerns expressed here; that consent, opt-in, etc is an import part of user agents making this kind of information available to sites.

Since Reporting is a general mechanism that doesn't expose in and of itself any information (but just enables other specs to send it), it's not immediately clear to me what "this kind of information" means. Can you elaborate?

Its very not settled, and I'm not at all trying to say "that group said X, so let's close this issue." I'm just trying to avoid splitting threads too much.

I'm fine discussing this either here and on the Privacy Principles repo, but @jyasskin suggested that this repo may be a better fit, since it's a tradeoff between a privacy principle (don't share data) and a design one (apps need to monitor their operations).

I was thinking of WICG/crash-reporting#1 (comment). I understood "significantly hurts the quality of the data" to mean "less data", i.e., many folks wouldn't consent if we asked them.

OK. FWIW, I disagree with this.

In my mind, there are 2 separate questions when we're talking about exposing certain data. There's the question of medium (JS API, Reporting API, etc), and there's the more important question: Can we expose this data from a privacy perspective?

The answer to the latter question doesn't change based on the medium. If exposing the data requires a permission prompt, then it should require one regardless of how it's exposed.

At the same time, requesting a user permission for any use of the Reporting API is similar to asking for permission for any use of an HTTP request.

I'm calling this user data because its generated by users, and (directly or otherwise) describes a user's environment, experience, capabilities, choice of software, etc.

So any data is "user data" in your definition? If we e.g. log HTTP status codes on both the server side and the client side and report both, would a 502 error code be considered "server data" when logged on the server and "user data" when logged on the client?

I think the Reporting API (and the related report types) intends to exposing new qualities and new quantities of data to sites

I can't help what you think, but the Reporting API is infrastructure for sending reports in the same way that Fetch is infrastructure for sending HTTP requests. It provides reliability and ergonomic advantages over Fetch, but a lot of the data that is exposed relying on that infrastructure could similarly be exposed by JS APIs that expose that data + fetch.

The "reliability" part of Reporting enables it to send those reports in cases where the site failed to load entirely, e.g. due to bad configuration of the site or of the resources it embeds. But it's the relying features that make use of that infrastructure to send potentially new data, in cases where it is warranted.

If there are privacy issues with e.g. HTML reporting to sites that their COOP is misconfigured, it makes sense to file issues against that specific functionality, outline why it violates user privacy and then tackle that.

What you're suggesting is different though. You're suggesting that if the infrastructure can potentially be misused, then any use of it should be put to stricter standards than the equivalent features (i.e. fetch()).
This sends a strong message that you think monitoring and deployment of web applications are not an important use-cases, which I'm trying to counter with this suggested principle.

@darobin
Copy link
Member

darobin commented Jun 1, 2022

Hey @yoavweiss!

I think that this question might benefit from being split a bit into:

  • What are the privacy considerations applicable to telemetry? These could include (I'm just making this up, not saying this outlines a position): should it be purpose-limited, should it be possible to opt out, should it be same-party limited, should it be anonymised, should it be surfaced to people, should it be the subject of collective governance, should it be maintained on the client for auditing purposes, etc. I think these questions are best discussed in the Privacy Principles TF.
  • Understanding what the privacy considerations are, how do they integrate and balance out with other design principles. That's probably for here.

I'm sorry to expand the workload on this (insert this_is_sparTAG.gif) but I don't think it's possible to balance privacy considerations with other considerations when the privacy considerations on this specific topic don't exist :) Those 30 years of architectural debt aren't gonna pay themselves!

@yoavweiss
Copy link
Author

yoavweiss commented Jun 2, 2022

Hey @darobin :)

  • What are the privacy considerations applicable to telemetry? These could include (I'm just making this up, not saying this outlines a position): should it be purpose-limited, should it be possible to opt out, should it be same-party limited, should it be anonymised, should it be surfaced to people, should it be the subject of collective governance, should it be maintained on the client for auditing purposes, etc. I think these questions are best discussed in the Privacy Principles TF.

I think we agree here. The privacy considerations for telemetry seem like something that the Privacy Principles TF could answer. They all rely on the assumption that telemetry (in support of deployability and monitoring) is a legitimate use case, which is what I want to establish here in this issue. Does that make sense?

@annevk
Copy link
Member

annevk commented Jun 2, 2022

So any data is "user data" in your definition? If we e.g. log HTTP status codes on both the server side and the client side and report both, would a 502 error code be considered "server data" when logged on the server and "user data" when logged on the client?

This could reveal somewhat limited information about end-user-specific proxies. (In many cases that information might also be attainable through fetch(), though Sec-Fetch-* headers and other features make it harder to impersonate certain types of requests.)

I agree that telemetry is useful, but we have to be extremely careful when it goes beyond what can be observed by websites directly. I don't think such care has always been demonstrated, which in part is why there's reluctance around this set of features.

Beyond that there's the argument that telemetry is wasteful for end users (in terms of computing resources) and therefore has to be opt-in. To me that seems more like a policy decision as you cannot really enforce the collection of telemetry through technical means. At least not where it does not go beyond what can already be observed. You could choose not to aid it, but that likely results in less overall end user control.

@gregwhitworth
Copy link

@yoavweiss shared this with me as I was going over some of my frustrations with browser breaking changes.

Unfortunately, lab based services such as the one you're proposing are not sufficient in order for sites to e.g. confidently know that:

  • They don't have unexpected performance or reliability issues in some parts of the world that are not covered by the lab service.
  • That they don't have unexpected issues they are hitting in some user scenario or on some user devices.
  • That their roll-out of security features is not hitting some unexpected cases, e.g. when interacting with some 3P providers that manifest themselves only in some parts of the world.

This is 100% true. to use Salesforce as one example there is a potential of millions permutations of the web application and then the spectrum of UAs, form factors, regions, network latency, etc makes the lab insufficient for truly deriving whether something will or won't break. This would be like proposing that browsers or OSes not have the capability of gathering telemetry but should just run labs.

What I do think is inappropriate, is to standardize browser functionality specifically designed for sites to collect user data about users' environments, network conditions, browser configurations etc, for purposes that don't directly benefit users, and without asking consent.

I'm going to split these in two:

What I do think is inappropriate, is to standardize browser functionality specifically designed for sites to collect user data about users' environments, network conditions, browser configurations etc, for purposes that don't directly benefit users

The premise that this telemetry does not benefit users is completely unfounded. We have had hundreds of customer cases opened by our customer's and end-users due to deprecations, functional and non-functional regressions. The entire desire that I have from this API is actually to ensure that our user's have a great experience.

and without asking consent

I agree with this if the group does feel the information is personal and has the potential of impacting privacy. I highly recommend that if we go this path then this should be identified and placed on specific properties and upon utiilization of those properties the prompt should be show; not on the more generic information.

To be a bit more specific, Salesforce is primary focus today is the DeprecationReportBody aspect of the API. This is invaluable as you're able to understand how many users are actually potentially hitting code paths that will break when a deprecation rolls out. Additionally, it enables us to have insights earlier on on potential breakages that we may not know about.

should it be same-party limited

I know you noted that you're just thinking these things out but I will ultimately push back against this but I won't dig into it too far until concrete proposals are put forward.

Happy to help figure this out with you all to keep this API moving forward and getting into other UAs as I'd like to increase the knowledge of impact for our users proactively rather than waiting for them to break to then get it addressed.

@torgo
Copy link
Member

torgo commented Jun 5, 2024

In the time since this issue was opened, some new text has been added on this topic to the Privacy Principles document which feel covers these cases. Hence we're going to close this. If people feel there is need for additional text in the Design Principles doc for this issue, please raise a PR.

@torgo torgo closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants