Skip to content

Latest commit

 

History

History
215 lines (104 loc) · 24.2 KB

2023-10-04-minutes.md

File metadata and controls

215 lines (104 loc) · 24.2 KB

TAG Privacy TF - Wed, 4 October 2023

Present: Jeffrey, Christine, Nick, Sam, Pete, Robin, Amy, Yoav Weiss (Guest)

Regrets: Wendy, Don, Tess

Ancillary data: #359 vs #221

Jeffrey: 221 is simple removal of a sentence; 359 is my re-write of a something about ancilary data... Stricter than what we could dio with defnitions....

What's the compromise between the different preferences? Does the proposal in 359 make sense?

Pete: I have 3 comments on 359.

Nick: the reason I have been persistant asking for an issue is that I think as aprocedural matter it'll be much easier to find consensus if we know what the problems are with the current text so we can talk about how to address them. We've rewritten this section completely more than once, I don't think doing it again will get us to consensus except by exhaustion. We should be narrowing down, as we get closer to consensus we should be making smaller changes and narrowing down the issue to address.

Jeffrey: Yoav can you summarise how web perf is thinking about this?

Yoav: I think the major issue that I have in talking to folks and that the broader wg has with the xisting principlle is the fact it ties ancillary use to ancilliary data. Where those thins in my mind should be separate. Effectively we need to ... we can use data in ancilliary ways which is already exposed for non ancilliary use cases, which we have no plan of unexposing. In which case that data is not ancilliary, even if the current website is primarily use it for ancilliary uses, that doesn't turn the data into something else

Jeffrey: we've got these performance apis that effectively summarise data that's available from dom events. If we remove the summary apis, people will just use the dom event apis which are harder to control and more expensive, so we shouldnt' remove those summary apis. But ones that are not summaries maybe we can put more restrictions on

Yoav: we have nothing that summairises web animation frame, but people are polling that and draining battery in order to measure framerates. Maybe we should have new apis to summarise the existing non anncilliary data in ancilliary ways.. that doesn't make that data ancilliary in my view

Robin: I don't think you're wrong. Good point.

Yoav: I think we need to separate the use from the data. In my view ancilliary data should be defined as data that is not exposed in any functional way, any non ancilliary way. W ehave a few examples of that that we are using. There are use cases for them, we would love to get rid of the current way of reporting and report them in aggregate. One example is dns resolution times. The other is presentation times in various web perf apis that are not well defined in the spec but at least in chromiums they are exposed and maybe we should figure out other ways to expose them. but generally there is a difference between data that's already web exposed on the platform for functional reasons and data that is not. That is where I'd love for us to draw the line.

Jeffrey: to Nick's point.. I think we are iterating on these definitions and making smaller and smaller changes to the definitions. Those have big downstream effects in the rest of the section. It's still the definitions that we're refining. I hope getting closer to what can be consensus. To decide whether we like the definitions we do have to follow their implications through the rest of the section and see what the whole section looks like in order to evaluate the proposal for a change in the definition

Sam: what are your best examples of the non ancilliary data?

Yoav: that is being reported in perf apis? resource timing has a lot of time stamps that are already web exposed through the various events. event timingi s exposed through the event. people can register event listeners and a timestamp and get that data that way but that is cumbersome in some configurations and some site architectures and can create extra overheads, so we prefer for them to have that data only for problematic events, that took longer than a certain threshold, and have them collected through an observer rather than active events that can get in the way. Similarly various load events, fetch apis. I can come up with a list later.

Dan: I'm wondering if we're getting farther away from the meaning of "ancillary" that we had in the document? Yoav's definition isn't meaningful from a privacy perspective.

Yoav: why not? if the data is web exposed for functional reasons that means that a privacy attacker can get access to the data regardless of any ancilliary uses of the data or ancilliary apis

Nick: that's why we previously had a section, information access, that Jeffrey wrote based on a conversaton with Yoav a year or so ago. I'm not fully understanidng why that doesn't address it. There might be existing apis that provide similar information and that seciton provides some about if it's not new information or if it's the same then it needs to be protected..

Jeffrey: that covers the summary apis but does not give us much guidence on the non summary apis, the apis that provide new information ..

Dan: both tuypes of apis you're talking about summary and non summary are providing telemetry?

Jeffrey: yes. The information access section says when you're doing a summary api here's the guidence on how to do it. if you're doing a non summary api, one that provides new data that is not otherwise exposed then this section says .. or if we take my pr, this section gives us concrete guidelines on how and whether to do it

Dan: I don't see the word summary in your section

Jeffrey: perhaps I should put it there. That's a way to think about which apis would not be exposing ancillary data

Dan: it sounds to me like we're talking about two kinds of ancillary data which is summary - is that anonymised?

Jeffrey: nope

Dan: in what way is summary data not akin to anonymised data?

Nick: summary apis are apis that provide telemetry useful data that could also be accessed from the existing functional apis, that's al lpersonal data, specific data about a sepcific user and their machine. The summary apis are providing a version of it that's easier for telemetry

Yoav: and potentially lossy. Eg event timing - disregarding very fast events and just reporting on the slow ones. It drops some of the information that is nto useful for the ancillary use cases.

Dan: is there a type of telemetry data that does not overlap with personal data?

Jeffrey: my proposal is that if you aggregate it enough using private aggregation techniques it stops being personal data. But it's a new research area, most apis dont' do that. My PR gives the web perf wg a research direction, and a direction to develop new apis - that you have to do it in an aggreated way so it stops being personal data

Pete: if personal data means data that can be tied back to an individual, aggregation in and of itself is not a privacy preserving mechanism

Jeffrey: even things like prio?

Pete: prio does things in addition to aggregating

Jeffrey: maybe we need better text that say what kinds of aggregation or what in additional to aggregaiton

Pete: happy to help write that, but it'll eliminate the kinds of uses that are described here. The overhead will make them very unpractical to collect

Yoav: can you elaborate

Pete: the cost of runing a prio server is expensive.. needs a large number of users.. not good for collecting outlier data. Specific to the approach that prio uses. Generally you need something in addition to aggregation to get privacy

Yoav: I'm not familiar with prio specifically. Definitely a direction... getting the data in aggregate is something that is at least for most of the types of ancillary data that we thought through would be good enough, but yeah losing outliers is a problem. I'd love to dig further...

Dan: I'm very concerned with the language .. I think we're all trying to get to the same place, it's not a criticisms, but I'm concerned with what ties us to "some data that is exposed for non ancillary uses... not considered ancillary data". That might sound reasonable.. I'm trying to read it from the perspective of someone who is writing a spec and doesn't ... it needs to get some actionable information. It sounds circular. What's an ancillary data... is it data used in an ancillary way? I'm worried about the fact we have ancillary data and uses as two concepts.

Yoav: ancillary data and ancillary uses are two different things. Ancillary data is necessarily used for ancillary data. but there could be uses that are ancillary for non-ancillary data. If the comment is editorial, that's reasonable..

Dan: and from a simplicty standpoint. to reduce the number of things we're defining and making sure wer'e not coming up with ancillary definitions (lol) ... to the main point we're trying to get across.

Jeffrey: on board with reducing the number of definitions. Want to make sure we're agreed on the concepts, then woudl love suggestions on wroding. Does it make sense that if you have an api that summarises what you could get from dom events then we don't want it in the class we're talking about here?

Christine: what do we actually mean by summarises? What's happening? It's aggregation?

Jeffrey: within a single user agent they find.. one example is a collection fo DOM events that take more than 200ms to finish executing. Or they even just count the number of DOM events that take that time

Dan: what specifically do people want to measure?

Yoav: user interactions. Clicks, keyboard, mouseover, scrolls. They have event handliers. Those event handlers can take a long time on the main thread and on the path between the user and the action in case those events are not passive, in most cases they are not cos they can be cancelled. Developers want to measure this, I believe the default threshold for event timing is 48ms. Then only events that go beyond this default threshold get reported. The alternative for that and what some developers did before event timing was to just listen .. subscribe to all those active events, measure the timestamp they started, measure the timestamp in which the handler ended, then report that. that has implications on .. it adds overhead to each one of those events and the shorter ones just get that added overhead for no good reason. It is cumbersome from a library author or performance team perspective to then override an event listenir, wrap everything aorund your own functions, call stacks are.. everything.. there are a lot of ocmplications .. makes it developer and user hositle. The summary here is not aggregation in any way, it's just reporitng the same data in a different channel without the developer and user overhead

Christine: so it's not really a summary, it's a subset?

Yoav: a subset provided through a different channel

Christine: that makes more sense

Jeffrey: the question for us is if we divide telemetry apis into the subset apis andthe new information apis and we want this.. do we want this section to talk about just the new information apis, is that okay? If so, how do we word that?

Christine: I think we're.. I don't know if this section is theright place. There are privacy issues with the new information with the new apis and with the subset (summary) apis. They may be different, but I think there's some overlap

Yoav: in what way? I guess I'm having a hard time seeing the privacy issues with reexposing data that's exposed elsewhere

JeffreY: I think that's waht the information access section is meant to cover.

Nick: the information access I think is good guidence on that but I think the other part where it matters for the even data that's available through an existing api is the section also provides recommendations that we should try to give users general decisions they should make about things like telemetry, I think that should apply even to apis that could be reimplemented through some other means

Yoav: I think this is where we maybe disagree

Nick: maybe your concern is if we give uses the ability to turn off some telemetry apis then sites will poactively decide to reimplement every summary api? I'm not sure we have good evidence that's the case, but maybe..

Yoav: that's essentially how a lot of the original perf apis were built based off polyfills that did those things. W'ere still having a hard time getting rid of those polyfills in the wild for cases where peformance apis are implemented in some browsers and not others. So yeah, people for long tasks spin settimeouts and test he main thread that way. There is a bunch of evidence I can work on collecting it if that' suseful

Nick: I understand there's evidence that if we don't provide that functionality at all then there's motivation to rebuild it through those polyfills. There doesn't seem to be evidence that a user deciding they don't want to partiicpate causes websites to reimplement it in other ways

Yoav: if there's a large percentage of users that will potentially cause developers to .. the typical game. If it's a single user it's a significant fingerprint, the fact that eventiming is not firing.

Pete: A privacy conern about summary apis is if you make it much easier for people to collect this data, more people iwll collect it

Dan: I think the key paragraph in the published version fo the document is "data exposed for ancillary users... configureation ... fingerprinting.. across sites". The reason we have these - rather than the definition fo ancillary - what are we trying to accomplish? Trying to make the web a more private environment where the user is in control of their data. When we started thinking about analyitcs and telemetry, trying to channel the energy from lots of different applications and websites that say are you okay with sending telemetry data to the maker of this application. Like when i install macos 10 it says are you okay with sending usage data to help improve the perforamnce of this application. I think that's the energy we're trying to channel into this discussion so you can give the user more control.

Jeffrey: i think the text in the current version about telemetry can expose this fingerprint information. it's true, but that information is much easier to get from dom events which are being summarised into telemetry data. I don't know I've seen evidence that developers move to telemetry data to infer a fingerprint, it's harder to get a fingerprint from telemetry data, they'd rather get it from dom events which we can get anyway

Yoav: if we're looking at click patterns, a user specific fingerprint.. that data is avaialbe on a functional dom events. it will nto be avaialbe in eventitming. You won't get where did the user click. Maybe there's a related element but that's significantly more rough. I don't think that fingerprinting libraries are extremely concerned with the user experience..

Jeffrey: to also think about the point about we should let users turn off telemetry. What the best way to get this doc to consensus is. I think there's a persistent disagreement about whether it's a good idea for the platform to have a signficiant number of people turning off thse subset apis. The ones that my PR says are not exposing ancillary data. I think we do have agremeent on .. the ones that expose otherwise unavailable information like memory measurement, crash reports, those deserve more restrictions. I thin kwe should iterate on exactly what the more restrictions are. I'd like to get that into the document. And try to dodge the disagreement or write down that there is a disagreement about whether to turn off the other telemetry apis. Make sense?

Christine: I heard two options - dodge it and say we weren't able to reach consensus. My preference is to call it out, even if we can' treach consensus, it's importatn to convey that. Be nice if we can reach consensus. I'd rather it be very very very clear.

Dan: document what we can reach conesnsu on, then document the ideally very small piece that we don't have consensus on. If teasing it apart further, fi there is a concensus around what we could say around summary apis for example as we've been discussing, let's try and get there

Pete: where do we have consensus?

Jeffrey: for the apis that expose new ancillary data there need to be more restrictions. The details of the restrictions.. I probably just wrote it wrong.. I don't know there's disagremeent about the kind of aggregation that we want, I thought that that implied the right stuff but can write something different there

Dan: speaking of using the raw dom apis as a fingerpriting techinque, do we know that if any browsers that are actively trying to combat fingerprinting are putting restrictions on the use of those types of apis or are profiling the use of those apis or reducing their uses?

Pete: sure, most browsers at this point reduce the timestamp granularity

Yoav: simialrly reduced in the reported.. event timing doesn't report more granualr timestamps than the event itself

Dan: we don't want to put restrictions on this new class of api, because it will just force people to go back and use this other class of api which they're' using anyway which are more dangeraous and we can't stop. But if there are mitigations that browser makers are already putting in place to reduce the use of those more datangerous tuypes of measurement for fingerprtinting that might put pressure on people to use these more targete dtelemetry apis, which if we can put additional privacy protections on top fothem that overall has the ffect of making the web better than we found it

Yoav: but I would assume any mitigations you can put on the functional sources of this data would be mitigations youc an also alpply on the ancillary point of access. Eg. the timestamps, you don';t want to coarsen them on one end and leave them uncoarsened on the other

Jeffrey: I think we already have wording in the information access section that that needs to happen

Dan: Nick, you expressed concerns that we're on the wrong direction with the current PR..

Nick: I think I have a better sense of the issue, which is document general choices about telemetry and to note the potential disagreement about applying those to things available elsewhere vs things not

Pete: my understanding of how we got here is Dan the way you summarised this as the way we want users to be able to say we want to help sites accomplish site goals even when they're not user goals, we want to give users better control. If this is related to site goals we should ask the user to consent. We've instead decided to cleave things along currently available vs differently avaialble to acheieve the same ends. I think the original way was better

Dan: yeah.. coming back to making this document understandable and work in a way that actually enhances privacy for the web and for users.. I would like to try and link it back to that concept of user goals vs site goals

Yoav: that's a point of disagremeent. I don't think that is a realistic division, other than web developers obfuscating their goals as user goals

Jeffrey: it may not be a division we can see in the apis. We see the browser sees calls to apis that we originally designed to suport user goals and we have no way of knowing whether the site is using them for user goals

Pete: lots of cases wher eit's easy to make that distinction and a small number where i't shard. you can't perfectly cut the world in two, but there are many apis explicitly for site goals, and a grey area where it's hard to distinguish

Jeffrey: large group of apis like dom events like click, where we added it to suport user goals, and lots of sites started using it to support site goals and the browser can'tn tell. Web perf says some are worth supporting because they don't hurt users - a little extra processing but less than the site is uisn gnow. We should support that. Those apis are clearly supporting a site goal rather than a direct user goals. It's easy t poke at those apis and say look this is a privacy harm. Web perfs point is that those are replacing worse harms, they're not adding privacy harms, they're improving things, they just got more visible, got a bigger target painted on them.

Dan: I'm thinking this is not written down anywhere in our privacy document and maybe it should be - what you just said? Explaining the current state of things and how we got here actually might be a reasonable thing to write down here. Explaining why it's not cut and dry.

Jeffrey: that's a lot more explanatory text than we've usually put in here.

Christine: maybe there's a middle ground where we can start with the principle that Pete said about user goals and site goals. And then to be able to do the explanation of the complexity perhaps look at the two scenarios of the subest apis and apis that do new stuff in terms of.. I agree, this is so messy that .. and in fact the solution is so tied to the problem we already have that I think it is worth explaining and maybe we do it as an annex if it's too much

Jeffrey: I might be comrotable with starting with the let users control the ancillary uses, and as long as we say don't pick on this group of apis because of that, that are actually helping things

Christine: I would say what you're arguing is that in those use scases there's a benefit to both the site and the user. Or there's a benefit to the site and now material effect on the users privacy. That sounds like what you're saying?

Pete: I don't agree ther'es no material loss to user privacy. The tradeoff between making these apis cheaper for sites to use is that more sites will use them

Jeffrey: I think there's a core disagreement bout what conclusion we should draw. I don't want this document to commit to either side of that disagreement

Pete: under no illusions we'll get to consensus on it

Jeffrey: hard to have the top level principle be give the users control over ancillary uses if that biases the discussion towards saying that these subset apis are bad because they .. or that browsers have to give users a checkbox to turn them off

Yoav: I'm concerned that when real user monitoring folks.. the next RUM library that will be built.. if we are to label some performance apis that are not related to user goals and privacy harming I'm afraid people will just use the apis that we're not labelling as such and we're effecitvely.. we're discriminating against the good citizens while inadvertantly helping the bad ones. All the fingerprinting, privacy harming libraries out there I don't think they're using web perf APIs in order to do what they do in terms of the aggregation apis at least. I'm sure they're using ancillary data in interesting ways, but for things where they are looking for the raw data for user fingerprinting I suspect they're using the raw events, the functional events, rather than anything that summarises them.

Robin: I'm agreeing with whoever last spoke...

Christine [in chat]: maybe it is worth documenting where browsers are blocking the kind of use Yoav is mentioning

Robin: I tend to agree with what christine is suggesting.. we don't have consensus, this is good arguements. Yes it's more text, but if this group can't align it probalby means more text is worth it. This is something people will struggle with an dit's worth explaining and providing groups with the materials to have those conversations. I'd lean towards writing more and explaining this.

Dan: yep

Robin: we're not going to land on wording that everyone is happy with, if we can teach both sides.. it feels justified

Dan: it does feel like some of the source of the disagreement is conflicting views of what the future will hold. We are looking into a crystal ball to some degree.

Nick: agree with Yoav's point and think it fits with some of the other text we have about design for purpose apis. There might be examples of web perf apis that are subsets of apis and apis to make it easeier fo ra website that's trying to do just the not so bad thing a way to do it. I thought the current text already supported that, but maybe we should clarify. I don't think giving the suggestion that users might want to turn off telemetry I don't think that says all telemetry or all web perf apis are bad. It's reasonable for users to have different views on that.

Jeffrey: I will try to write up this discussion in a new PR.. it's going to lose the rest of the original text or maybe move it around, but we'll see

Pete: 3 comments for the current PR - is it useful to add those, or just wait for the new PR?

Jeffrey: add them, a bunch of concepts will stick around