Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on protection offered from restricting to single source/trigger websites #10

Open
csharrison opened this issue Aug 25, 2022 · 8 comments

Comments

@csharrison
Copy link
Contributor

csharrison commented Aug 25, 2022

In the end to end doc it states:

Finally, to prevent a report collector from circumventing their privacy budget by running duplicate queries, we need the helper party network to enforce that a report collector, (acting on behalf of a specific source or trigger website app) can only run source or trigger fanout queries related to that website/app. We can achieve this by having the helper party network ensure that:

  1. That they (the helper party network) are the helper party network committed to by the report collector.
  2. All provided encrypted match keys are from the match key provider committed to by the report collector.
  3. That the current source or trigger fanout query only contains source or trigger reports (respectively) with a match key generated on the report collector’s associated website/app, which is recorded in the current website/app field of the datai

I can understand why you would want to enforce (3) (e.g. you don't want to support other types of queries than single site trigger / source fanout), but I am not sure why removing it allows circumventing privacy budgets. In particular, the website/app does not seem to appear in the "grain" specified in the budget management section, which species a grain of (match key, report collector, epoch).

e.g. if a trigger fanout query contains multiple trigger sites, what breaks here? What kind of "duplicate queries" are we worried about?

@martinthomson
Copy link
Collaborator

So I think that you are right that maybe we could structure the delegation differently in order to provide more capabilities. Our initial approach only has trigger fanout queries take trigger events from a single trigger site.

What we could do is have the delegation to a report collector give the report collector the ability to include more than one site in queries. The result would be that the budget for those sites is pooled1, but it might mean that a single entity can operate multiple front-ends to its operations, but perform attribution jointly across them.

To take an example, say I sell hats and shoes on different websites: hats.example and shoes.example. I could delegate responsibility for trigger fanout queries to rc.example for both of those sites and we could allow a single query to include trigger events from both sites. This would mean that both sites would be paying down their budget for the query.

Footnotes

  1. We can perhaps be clever about this. The helpers will need to be informed about what sites were involved in the query, so maybe they can keep separate per-site budgets, but deduct from the budget of involved sites only.

@csharrison
Copy link
Contributor Author

Before jumping to solutions and new capabilities (which I am interested in, don't get me wrong), I want to make sure I understand precisely the current proposal and what benefits we are getting from restricting queries like this. I didn't necessarily mean to propose a behavior change in this issue.

From what you wrote, it seems like you are saying that the privacy unit/grain will include the trigger site (for trigger fanout queries), but this is not clear in the document, which indicated a grain of (match key, report collector, epoch). With the grain specified in the current doc, budget will always be pooled across trigger sites.

Should we amend the current doc to mention that the "site" will be included in the privacy unit, leading to a total unit of (match key, report collector, site, epoch)?

@benjaminsavage
Copy link
Collaborator

This is a helpful issue @csharrison - thanks for filing!

OK, this probably is insufficiently clear in the document. Here is how I think about it:

We've defined "report collector" as such:

Report collectors: A specific website/app or a delegate acting on their behalf, that issues queries to the helper party network.

So we are basically using "report collector" as a variable name that is a stand-in for the app/website making the query. We did that because potentially the app/website might want to delegate this responsibility to some other party (e.g. an MMP). But even in the case of delegation, the current idea was that the "grain" of the privacy budget was per app/website. As @martinthomson says - in the case that multiple apps/websites all delegate to the same MMP, we need to clarify what the intended behavior is (I agree this is unclear at the moment - happy to discuss ideas!). The simplest possible thing to do (and what I've been imagining thus far) is that each app/website continues to have its own, isolated privacy budget, and the MMP to which they've delegated responsibility for running queries cannot mix events across multiple apps/websites. Basically - they just function as a "service provider" that collects events and runs queries on the behalf of multiple businesses - with no co-mingling of data across them.

The other part of your question was:

but I am not sure why removing it allows circumventing privacy budgets

Here's the attack I was thinking of:

  1. Website "A" creates a source event se1
  2. Website "B" creates a trigger event te1
  3. The Report collector corresponding to website X1 makes a "trigger fanout query" including the se1 and te1.
  4. The Report collector corresponding to website X2 makes a "trigger fanout query" including the se1 and te1.
  5. The Report collector corresponding to website X3...
  6. ...
  7. The Report collector corresponding to website X1000...

This would allow an unlimited amount of information leakage from the system.

Ensuring that "Report collector corresponding to website X1" is unable to run queries where neither source, nor trigger event relates to website X1 defeats this attack.

@csharrison
Copy link
Contributor Author

@benjaminsavage thanks, that clears it up for me! The attack makes sense to me now that I know the intended privacy unit.

I think there are two cases to discuss:

  1. One MMP partners with multiple sites to do measurement
  2. Multiple MMPs partner with a single site

For (1), what you wrote makes sense but I do want to mention that there are use-cases where "pooling" data across many small advertisers seems very useful, especially if the small advertisers are "similar" and are so small that noise from privacy protections might wash away meaningful signal on its own. We should consider allowing this as long as queries deplete budget from all participating sites properly. See also issue WICG/attribution-reporting-api#190 which touches on this.

For (2) I would guess the behavior is that multiple MMPs are allowed but draw from the same per-site privacy budget? We would need to hash out how that works with the helper network / match key provider commitments though.

@benjaminsavage
Copy link
Collaborator

For (1), what you wrote makes sense but I do want to mention that there are use-cases where "pooling" data across many small advertisers seems very useful, especially if the small advertisers are "similar" and are so small that noise from privacy protections might wash away meaningful signal on its own. We should consider allowing this as long as queries deplete budget from all participating sites properly. See also issue WICG/attribution-reporting-api#190 which touches on this.

I agree. I definitely can see this use-case. This is where I think we come back to @martinthomson's comment above:

What we could do is have the delegation to a report collector give the report collector the ability to include more than one site in queries. The result would be that the budget for those sites is pooled.

So essentially - these multiple sites would simply be treated as a single unit, and would share a common privacy budget.

As I mention in my comments on #3 - I think there's the opportunity to potentially actually encourage this. As @martinthomson explains in that issue - we have to assume there are potentially colluding sites, and bake that estimate into our differential privacy budget. If these sites are instead transparently acting as a unit and sharing a common privacy budget, this maybe can help us reduce the estimated number of "potentially colluding sites".

@benjaminsavage
Copy link
Collaborator

For (2) I would guess the behavior is that multiple MMPs are allowed but draw from the same per-site privacy budget? We would need to hash out how that works with the helper network / match key provider commitments though.

So this is tricky. Even assuming we figure out the commitment stuff, the privacy budget part is harder. I think that the app/website that's trying to delegate to multiple MMPs would need to decide how much budget to give each one of them. Like 60% to MMP1, and 40% to MMP2?

I would like to understand the use-case that would push an app/website to need to work with multiple MMPs. I know we've discussed this before at a prior PAT-CG, do you have more detail on these use-cases @csharrison?

@csharrison
Copy link
Contributor Author

For (2) I would guess the behavior is that multiple MMPs are allowed but draw from the same per-site privacy budget? We would need to hash out how that works with the helper network / match key provider commitments though.

So this is tricky. Even assuming we figure out the commitment stuff, the privacy budget part is harder. I think that the app/website that's trying to delegate to multiple MMPs would need to decide how much budget to give each one of them. Like 60% to MMP1, and 40% to MMP2?

I would like to understand the use-case that would push an app/website to need to work with multiple MMPs. I know we've discussed this before at a prior PAT-CG, do you have more detail on these use-cases @csharrison?

This is a pretty complicated space so let me file another issue tackling this directly. I think we're at a point where we can resolve this issue with a few minor tweaks to the e2e doc to just clarify the current desired behavior:

  • The desired privacy unit is "per site", and the report collector is just a delegator. They are not privileged from the privacy budget POV. A single report collector can work with multiple sites.
  • We are open to more advanced queries that do some form of "pooling" across sites

@eriktaubeneck
Copy link
Contributor

eriktaubeneck commented Aug 31, 2022

This is a pretty complicated space so let me file another issue tackling this directly.

Great, and agreed. I would say we still have work to do here WRT the threat model as well, and hopefully more discussion here can help flesh it all out.

I think we're at a point where we can resolve this issue with a few minor tweaks to the e2e doc to just clarify the current desired behavior:

  • The desired privacy unit is "per site", and the report collector is just a delegator. They are not privileged from the privacy budget POV. A single report collector can work with multiple sites.

Yes - if you have any suggestions for making that more clear, I'm happy to update the document. Finding the right way to describe this (and not imply that sites/apps must directly run their own queries) was tricky! (And we clearly haven't got it right yet.)

  • We are open to more advanced queries that do some form of "pooling" across sites

Sure! The current doc was primarily focused on the minimally viable proposal out, with the intention of getting this type of feedback and suggestions for extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants