-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on protection offered from restricting to single source/trigger websites #10
Comments
So I think that you are right that maybe we could structure the delegation differently in order to provide more capabilities. Our initial approach only has trigger fanout queries take trigger events from a single trigger site. What we could do is have the delegation to a report collector give the report collector the ability to include more than one site in queries. The result would be that the budget for those sites is pooled1, but it might mean that a single entity can operate multiple front-ends to its operations, but perform attribution jointly across them. To take an example, say I sell hats and shoes on different websites: hats.example and shoes.example. I could delegate responsibility for trigger fanout queries to rc.example for both of those sites and we could allow a single query to include trigger events from both sites. This would mean that both sites would be paying down their budget for the query. Footnotes
|
Before jumping to solutions and new capabilities (which I am interested in, don't get me wrong), I want to make sure I understand precisely the current proposal and what benefits we are getting from restricting queries like this. I didn't necessarily mean to propose a behavior change in this issue. From what you wrote, it seems like you are saying that the privacy unit/grain will include the trigger site (for trigger fanout queries), but this is not clear in the document, which indicated a grain of (match key, report collector, epoch). With the grain specified in the current doc, budget will always be pooled across trigger sites. Should we amend the current doc to mention that the "site" will be included in the privacy unit, leading to a total unit of (match key, report collector, site, epoch)? |
This is a helpful issue @csharrison - thanks for filing! OK, this probably is insufficiently clear in the document. Here is how I think about it: We've defined "report collector" as such:
So we are basically using "report collector" as a variable name that is a stand-in for the app/website making the query. We did that because potentially the app/website might want to delegate this responsibility to some other party (e.g. an MMP). But even in the case of delegation, the current idea was that the "grain" of the privacy budget was per app/website. As @martinthomson says - in the case that multiple apps/websites all delegate to the same MMP, we need to clarify what the intended behavior is (I agree this is unclear at the moment - happy to discuss ideas!). The simplest possible thing to do (and what I've been imagining thus far) is that each app/website continues to have its own, isolated privacy budget, and the MMP to which they've delegated responsibility for running queries cannot mix events across multiple apps/websites. Basically - they just function as a "service provider" that collects events and runs queries on the behalf of multiple businesses - with no co-mingling of data across them. The other part of your question was:
Here's the attack I was thinking of:
This would allow an unlimited amount of information leakage from the system. Ensuring that "Report collector corresponding to website X1" is unable to run queries where neither source, nor trigger event relates to website X1 defeats this attack. |
@benjaminsavage thanks, that clears it up for me! The attack makes sense to me now that I know the intended privacy unit. I think there are two cases to discuss:
For (1), what you wrote makes sense but I do want to mention that there are use-cases where "pooling" data across many small advertisers seems very useful, especially if the small advertisers are "similar" and are so small that noise from privacy protections might wash away meaningful signal on its own. We should consider allowing this as long as queries deplete budget from all participating sites properly. See also issue WICG/attribution-reporting-api#190 which touches on this. For (2) I would guess the behavior is that multiple MMPs are allowed but draw from the same per-site privacy budget? We would need to hash out how that works with the helper network / match key provider commitments though. |
I agree. I definitely can see this use-case. This is where I think we come back to @martinthomson's comment above:
So essentially - these multiple sites would simply be treated as a single unit, and would share a common privacy budget. As I mention in my comments on #3 - I think there's the opportunity to potentially actually encourage this. As @martinthomson explains in that issue - we have to assume there are potentially colluding sites, and bake that estimate into our differential privacy budget. If these sites are instead transparently acting as a unit and sharing a common privacy budget, this maybe can help us reduce the estimated number of "potentially colluding sites". |
So this is tricky. Even assuming we figure out the commitment stuff, the privacy budget part is harder. I think that the app/website that's trying to delegate to multiple MMPs would need to decide how much budget to give each one of them. Like 60% to MMP1, and 40% to MMP2? I would like to understand the use-case that would push an app/website to need to work with multiple MMPs. I know we've discussed this before at a prior PAT-CG, do you have more detail on these use-cases @csharrison? |
This is a pretty complicated space so let me file another issue tackling this directly. I think we're at a point where we can resolve this issue with a few minor tweaks to the e2e doc to just clarify the current desired behavior:
|
Great, and agreed. I would say we still have work to do here WRT the threat model as well, and hopefully more discussion here can help flesh it all out.
Yes - if you have any suggestions for making that more clear, I'm happy to update the document. Finding the right way to describe this (and not imply that sites/apps must directly run their own queries) was tricky! (And we clearly haven't got it right yet.)
Sure! The current doc was primarily focused on the minimally viable proposal out, with the intention of getting this type of feedback and suggestions for extensions. |
In the end to end doc it states:
I can understand why you would want to enforce (3) (e.g. you don't want to support other types of queries than single site trigger / source fanout), but I am not sure why removing it allows circumventing privacy budgets. In particular, the website/app does not seem to appear in the "grain" specified in the budget management section, which species a grain of (match key, report collector, epoch).
e.g. if a trigger fanout query contains multiple trigger sites, what breaks here? What kind of "duplicate queries" are we worried about?
The text was updated successfully, but these errors were encountered: