Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ancillary uses to focus on 2 kinds of ancillary APIs rather than ancillary data. #361

Merged
merged 7 commits into from
Nov 22, 2023

Conversation

jyasskin
Copy link
Collaborator

@jyasskin jyasskin commented Oct 6, 2023

This is a replacement for #359 after the discussion in https://github.com/w3ctag/privacy-principles/blob/main/meetings/2023-10-04-minutes.md. It will close #220.

@pes10k, I think I fixed your comments:

  • "identifiable person" was an echo of a phrase in the definition of "personal data", but I found a way to avoid it.
  • I replaced the CSP report example with event timings. There's also a nit there that the duration field is a novel piece of data, so I could narrow this to just the processingEnd field.
  • I reworded the "it complies with this principle" bit and also moved the second principle in the section down to just before the paragraph that elaborates on it. This violates the guideline we've been following to put principles at the tops of their sections, so maybe folks will want me to move it back.

Preview | Diff

index.html Outdated
<dd>
APIs that filter or summarize information available from [=non-ancillary
APIs=], like the [[[event-timing]]] and <a
data-cite="intersection-observer#introduction">IntersectionObserver</a>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't consider intersection observer ancillary. It's somewhat polyfillable, but can be used to e.g. lazily load parts of the UI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and that's supported by https://developer.mozilla.org/en-US/docs/Web/API/Intersection_Observer_API. What's a good API to use as an example here? Ideally one that does more processing than just filtering out un-interesting events. I'll try to look again before the Wednesday meeting, but you might just know one off the top of your head.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated
when [=ancillary data=] contributes to a collective benefit in a way
that reduces privacy threats to individuals (see <a href="#principle-collective-privacy">collective
privacy</a>).
<dt><dfn>Novel ancillary APIs</dfn></dt>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "new sources of ancillary data"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that or

Suggested change
<dt><dfn>Novel ancillary APIs</dfn></dt>
<dt><dfn>Ancillary APIs that provide new information</dfn></dt>

along with <dfn>Ancillary APIs that filter or summarize other sources of information</dfn> above to stay parallel. I'll give other folks a chance to weigh in before making this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npdoty suggested using the "unavoidable information exposure" phrase, and I'll try to work that in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; please check that the new wording looks ok.

index.html Outdated
APIs if a significant number of people turn them off, and that the act of
turning them off can ironically contribute to [=browser fingerprinting=].
Opponents argue that if data's easier or cheaper to collect, more sites will
collect it, and because there's still some risk and cost, users should be able
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "cost" here related to privacy?

Copy link
Collaborator Author

@jyasskin jyasskin Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: it's about the computational and network costs, so not about exposing user information. I think the connection to privacy is that it's about using a user's resources without their knowledge for things they might not like, and if they notice, they might be uneasy about what else the site can do. But the closest item in https://wiki.openrightsgroup.org/wiki/A_Taxonomy_of_Privacy that seems to match is Intrusion, and that's limited to when the intrusion is “highly offensive to a reasonable person”, which is probably not the case for these APIs.

@pes10k, I'm trying to represent your concern with this bit, so maybe you can justify why this is a privacy concern better than I can?

Otherwise, I can just remove "and cost" without touching the rest of the paragraph.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the privacy impact of cost was primarily the point in the previous part of the sentence -- that reducing cost will make the data collection more ubiquitous.

Users may also want to disable APIs just because they cost network and processing resources; in some cases that might be considered a privacy issue, but I think that's mostly just a distinct user control issue and doesn't need to be covered here. Removing "and cost" seems just as clear to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fwiw, @npdoty expressed the concern i was trying to get across)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed "and cost".

index.html Outdated
asking for each use of the APIs.

API designers should maintain APIs that had to make one of these choices and
should keep trying to evolve them toward aggregating the data instead.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pes10k @sandandsnow You'd been concerned about saying that "aggregation" is sufficient for depersonalizing data. Is there a term I should use instead? Above I used "private aggregation", but I'm happy to use something else or to define something.

index.html Outdated
<span class="practicelab"
id="principle-novel-ancillary-apis-shouldnt-reveal-personal-data">[=Novel
ancillary APIs=] should not reveal any [=personal data=] that isn't already
available through other APIs, without permission.</span>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"permission" bakes a solution into the principle. Maybe "without proper mitigations or justifications"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that "proper mitigations" leaves it too open to debate. https://w3ctag.github.io/privacy-principles/#data-minimization uses "aligns with the user's wishes and interests", so I could use that here, if folks agree:

Suggested change
available through other APIs, without permission.</span>
available through other APIs, without an indication that doing so aligns
with the user's wishes and interests.</span>

npdoty
npdoty previously requested changes Oct 11, 2023
Copy link
Collaborator

@npdoty npdoty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear what issue this addresses or how this re-write improves on the existing text.

This seems to repeat and more deeply encode the new-vs-old distinction, which seems to set up further debates over how minimization shouldn't apply if something is old enough or has some alternative use.

This seems to delete much of the text about solicitation and respecting of general preferences around analytics and other ancillary uses. Further, it seems to actively discourage browsers from providing general choices.

index.html Outdated
use of this data.

[=User agents=] usually can't do much about data collection from the
[=non-ancillary APIs=].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this statement is accurate. It also doesn't seem to be necessary for any principles in this section.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly here because I wanted to say something about all three categories that I introduced above. I'm fine with removing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

index.html Outdated
APIs if a significant number of people turn them off, and that the act of
turning them off can ironically contribute to [=browser fingerprinting=].
Opponents argue that if data's easier or cheaper to collect, more sites will
collect it, and because there's still some risk and cost, users should be able
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the privacy impact of cost was primarily the point in the previous part of the sentence -- that reducing cost will make the data collection more ubiquitous.

Users may also want to disable APIs just because they cost network and processing resources; in some cases that might be considered a privacy issue, but I think that's mostly just a distinct user control issue and doesn't need to be covered here. Removing "and cost" seems just as clear to me.

index.html Outdated Show resolved Hide resolved
index.html Outdated
[=non-ancillary APIs=].

There is disagreement about how [=user agents=] should handle [=summarizing
ancillary APIs=]. Advocates of these APIs argue that they're hard to use to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what we're gaining from summarizing the different arguments here. This doesn't seem to provide any advice as a result of the documented disagreement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with removing this paragraph if that's what the group wants to do.

index.html Outdated
functionality.

We do have consensus that [[[#information]]] governs [=summarizing ancillary
APIs=] and that:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to note consensus that the other principles in the document still apply? That would seem to be the case for all principles.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with dropping that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to

Because different users are likely to have different preferences:

That's still a little awkward, so I'd welcome other wording suggestions.

<div class="practice" data-audiences="user-agents">
<span class="practicelab" id="principle-disabling-novel-ancillary-apis">User
agents should provide a way to disable [=novel ancillary APIs=].</span>
</div>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that users should be allowed to turn off this subset of APIs, but not other APIs.

And it seems to set up an incomprehensible and unpleasant choice for users. Rather than asking users whether they want to provide telemetry data, UAs would instead ask, "do you want to disable novel ancillary apis but continue to provide very similar data through a different set of ancillary apis?"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other ancillary APIs aren't providing "very similar data". If they were, this set of APIs wouldn't "provide new information."

UAs are also free to have their setting turn off more APIs than the ones called out here; this just sets a minimum bar.


To help [=sites=] understand user preferences, user agents can provide
browser-configurable signals to directly communicate common user preferences
(such as a [=global opt-out=]).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this being deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not related to ancillary data, and https://w3ctag.github.io/privacy-principles/#dfn-global-opt-out still says that UAs can provide this sort of signal.

@@ -1161,34 +1215,43 @@
Group">PATCG</abbr></a> groups.
</aside>

[=User agents=] should aggressively <a href="#data-minimization">minimize</a> [=ancillary
data=] and should avoid burdening the user with additional [=privacy labor=]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the requirement to minimize this data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already say to minimize the data in https://w3ctag.github.io/privacy-principles/#data-minimization. I don't think we actually have consensus to "aggressively" minimize the APIs that are computed from existing information, and the new text says something more precise and stronger about the ancillary APIs that provide new information: that they shouldn't provide personal data at all.

index.html Outdated
ask the user whether they generally support sharing this data, rather than
asking for each use of the APIs.

API designers should maintain APIs that had to make one of these choices and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the maintenance requirement? I'm not clear what we're asking designers to do.

index.html Outdated
asking for each use of the APIs.

API designers should maintain APIs that had to make one of these choices and
should keep trying to evolve them toward aggregating the data instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a useful exhortation? What would convince API designers to migrate to an aggregation alternative if they aren't required to minimize data and users are actively discouraged from disabling this functionality?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API designers are required to minimize data by https://w3ctag.github.io/privacy-principles/#data-minimization. I'd be ok with deleting this paragraph if folks don't think it's useful, but I figured it'd be useful to encourage designers to keep trying to find alternatives that don't reveal personal data. We could also re-word it if there's a clearer wording for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on today's conversation, trying:

If an API had to make one of these choices, and then something else about the
API needs to change, designers should consider replacing the whole API with one
that avoids exposing [=personal data=] if possible.

@jyasskin jyasskin force-pushed the rewrite-ancillary-uses branch from 490587e to f6fac98 Compare November 10, 2023 01:15
index.html Outdated Show resolved Hide resolved
@pes10k
Copy link
Collaborator

pes10k commented Nov 18, 2023

From slack

But if @torgo (or others) supports that direction again, as an alternative to the “yes/no exists and not planned for removal in existing APIs” way of splitting things up, then I’d like to revist```

Copy link
Member

@torgo torgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we agreed to merge on today's taskforce call

@jyasskin jyasskin dismissed npdoty’s stale review November 22, 2023 17:05

Per @torgo's above comment.

@jyasskin jyasskin merged commit 1dd68c8 into w3ctag:main Nov 22, 2023
1 check passed
@jyasskin jyasskin deleted the rewrite-ancillary-uses branch November 22, 2023 17:06
github-actions bot added a commit that referenced this pull request Nov 22, 2023
…han ancillary data. (#361)

SHA: 1dd68c8
Reason: push, by jyasskin

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove or revise examples of signals browsers should use to prevent "privacy labor" in ancillary data
5 participants