-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimum patch sizes? #207
Comments
Thanks for starting the PING review! One key difference between the current IFT specification and the previous two approaches is that the patches are pre-computed by the application that generates the IFT font. This means they are the same for everyone (with a corresponding increase in cache usage and a decrease in privacy leaking) and do not need a smart server; these are just static files. This is in contrast to the old Patch/Subset approach, where patches were computed dynamically on demand by the server. A malicious server could have been set up to infer what content was being ready by a user, based on the sequence of patch requests. That was the reason for the minimum patch size requirement, to obfuscate that potential tracking vector. The current spec does mention the impact of too-small patch sizes, in Reducing the Number of Network Requests but here, the concern is network performance degradation. |
In addition, from Content inference from character set:
|
@pes10k does that answer your question? |
hi @svgeesus , apologies for this falling through the cracks, did not mean to leave you hanging for so long. I'll make sure it doesn't happen again. I think i understand the change, but I don't understand if / how it addresses the privacy concern. The privacy concern wasn't about dynamic vs static. patch sizes, but about having patch sizes that were very small, and so the server could watch what patch-requests were made to learn about how the user was interacting with the page (in a way that having the font be an all-or-nothing request wouldn't allow). More generally, w/o IFT, a user agent could fetch a font w/o risk that the server serving the font could learn how the user was interacting with the page (what parts of the document the user was reading, whatever). IFT change that, since, if the patches were very granular, the font server might be able to learn which part of the document the user was reading. If a glyph only appears in a document once, and the font server sees a patch request for that glyph, the font-server learns some information (some probability) that the user is reading that part of the document. At the margins, IFT allowed the font server to learn information that previously wouldn't have been possible with font-requests alone (and would have relied on something like JS execution, or images, etc.) How useful that new information is to the curious font server would depend on a bunch of things: how large the document is (so the number and distribution of glyphs needed to render the document), how granular the patches are, etc. But, there is some new information being leaked here that wasn't available to a font server before. Having significantly large minimum patch sizes before addressed the concern previously, since it made it unlikely (though not impossible) that a patch would be practically useful to a curious font server. It didn't address the concern fundamentally, but pushed it further out towards unpractical. I understand your comments to be saying that "changing the patch-sizes and contents from being dynamically negotiated, to being static and predetermined addresses the privacy concern, or makes it similarly impractical, in the way that large minimum patch sizes did." If I'm understanding you correctly, then I dont understand why this is so. If the server uses very small / very granular patches, then a curious font server can still carry out the same attack as they could have in the previous IFT version w/o the minimum patch sizes no? |
In our discussions of privacy within the group so far we've been concerned with the scenario of identifying a document that is being read based on the glyphs being requested. In early drafts of the patch-subset specification, only the glyphs that were needed to render the document were requested. Because some documents might have distinctive patterns of glyphs, this raised the possibility that server providing the subset could determine what document the user was reading and associate that with the IP address. We addressed this (to the extent it was addressed) by having the client side request additional glyphs in a way that made it statistically unlikely that a particular document could be identified. You are discussing a different scenario that I don't think we have discussed, one in which the font server presumably already knows what document is being rendered, and the question is what part of the document is being rendered at the moment. You say:
I'm not sure I agree. We weren't trying to address the within-the-page problem in making those recommendations about patch size and when limiting the case to a particular document and asking about the parts, the statistical analysis would be very different. I would guess that even with substantial patch sizes you could guess the portion of a document that corresponds to a set of patches being requested with reasonable accuracy. There may be no good solution to the within-the-page privacy problem when serving a font incrementally. Certainly the current solution to IFT (based on unicode-range) doesn't address it. |
Currently the solution to serving fonts with large glyph sets without IFT is to break them into pieces which are then selected by the browser based one which code points are present via the unicode range mechanism. So it is already currently possible for font loading to reveal information about the contents of a page to the font server without needing JS execution or images. IFT under the new approach is now quite similar to how unicode range works where there is a preset list of segments which are selected based on codepoint presence. The main difference with unicode range is that the loaded segments are merged into a single font vs being left as independent fonts. So I don't believe IFT is making anything possible which can't already be done via unicode range. Just like with unicode range an efficient IFT encoding will have a practical limit to how small groupings can be made where the per group overhead will outweigh savings from smaller groupings. |
To expand on this a bit more: based on observations from prototype implementation I would expect that smallest group sizes for an efficient encoding would be around 10 code points. For comparison the previous iteration of this spec had a specified minimum group size of 7. |
I would also like to emphasize the fact that the new IFT approach serves incremental font subsets having static predefined glyph groupings based e.g. on character & character combination frequencies (determined beforehand without regard for a specific content) vs. the old approach where the groupings were requested by a client based solely on the content to be rendered. A client knows what groups are available and what glyphs are included within those groups; however, when it asks for a delivery of a group or a combination of groups to build the needed glyph repertoire - it does so without ever divulging what specific characters it intends to render. |
Hello all! I'm trying to summarize and address all responses above. If folks think i've left anything out, please correct me. 1. This is a new concern, and not something thats been brought to the groups attention beforeIm surprised by this; this is the same concern that was discussed at length with the previous version of the spec (see issue #50 from the previous PING review). I understand that the group has done significant work trying to address the concern of "can the font server learn the document based on patch request patterns," and I think thats an important source of leakage to also address (and im glad and grateful you all are working on it). But, nevertheless, the concern im expressing in this issue isn't new, and so i'm surprised in that respect. 2. IFT is codifying what sites and user agents already doSites already include multiple font sub-resources, and user agents already schedule those font requests based on when they're needed to render the document, based on how the user is interacting with the page. If i understand @garretrieger correctly above, this is very interesting new information to me, and not something i was aware that browsers currently did (neat!). I want to make sure i understand whats being said correctly, so im going to describe a toy example below. Please let me know if I am understanding correctly. Toy example (excuse the nonsense unicode range numbers, and probably imprecise use of character / code point / glyph terms):
If the above is all correct and required by existing standards, then I agree with you @garretrieger, and I don't think there is any new leakage introduced by this proposal. But if this behavior isn't required by existing standards, then I think the IFT change would be introducing a new, standardized privacy leak (and so, the spec should mitigate that leakage) 3. Increasing and / or requiring a minimum patch size wouldn't address this leakageI understand @skef and @vlevantovsky to be saying the above. I think this is incorrect. I appreciate that the server doesn't know from the patch-request itself which codepoint triggered the patch request, and that the font server only directly learns the range of gyphs/codepoints being requested. However, as far as I can tell, theres nothing in the spec that prevents the patch request from being for a single codepoint, and so revealing to the server exactly what character is being requested. Thats why the suggestion for the issue here is to define a minimum patch size in the spec, so that a correct implementation of this feature in a user agent couldn't end up harming user privacy. And even if a user agent weren't to go to the extreme of single-codepoint patch requests, small patches pose a similar risk, just changing the attack being deterministic to probabilistic. In general, I expect that the larger the patch size, the less confident the font server can be about what codepoint triggered the patch request (and so, what user and/or page behavior occurred). Figuring out the right balance here is tricky, and something that I bet is really easy for non-experts (like myself) to guess wrong on. And so, it would be extremely helpful for experts (like yourselves) to add guidance (and in particular, normative guidance) in the spec so that an implementor can implement this behavior, without unintentionally harming their users, and their users' privacy. |
The emphasis on ordering in this phrasing makes it sound as if all of the listed sub-fonts will typically be loaded for a page at some point, it's just a matter of in what order. That's not typically how things would work, or the goal of this use of Accordingly, if a page is broken up into separately rendered portions (as with, say, current reddit), fonts would be loaded according to the new codepoints used in the part of the page being rendered. And in that way, if one knew the content of the whole page, one might be able to tell at the server-side which portion of the page was being rendered. |
The basic problem is that if "the server" (or whoever controls it) can know the content of the page being rendered, and the only question is what portion of the page is being rendered, then the statistics don't work in your favor. Suppose you have the expected encoding of a CJK font, where codepoints are grouped together in patches by frequency. Then say there is a particular document that (like documents will) mostly has high-frequency codepoints, but also happens to have three low-frequency codepoints, and those are spread out in three different patches. If the server knows the content of the document, then it can tell that the requests for each of those patches will respectively correspond to the portion of the document where the codepoints are used. So how would you mitigate this?
|
To echo what Skef said, user agents will only download the fonts of @font-face's where the content being rendered intersects the unicode range list for that @font-face. So in your example if the page only contained codepoints listed in the unicode range for /large-font-pt8.woff2 then only that font would be loaded. This behaviour is currently specified as part of the CSS fonts module here: "The union of these ranges defines the set of codepoints for which the corresponding font may be used. User agents must not download or use the font for codepoints outside this set." |
One of the reasons I've hesitated to include a specified minimum patch size with this approach, is there are legitimate reasons why you might have an encoding with a patch triggered by a single code point which doesn't leak information. For example consider the case of unicode variation selectors. It would be entirely reasonable to have a single patch attached to one of the variation sequence codepoints (ie. FE00) which pulls in the related alternate glyphs that can be generated. The variation code point isn't particularly meaningful on it's own so isn't communicating anything other then information about which script is being used on a page. Since in my view this has very similar implications as the unicode range mechanism for privacy and unicode range does not enforce a minimum size then we aren't introducing any new issues. |
Thank you for pointing me at this. To be honest I didn't know this was already in the spec. I agree that, since this is already part of the platform, then my concern is sorta redundant. I am going to change my issue from That said, two other (non-blocking) suggestions:
Anywho thanks for walking me through this and helping me understand how this interacts with existing specs. Like i mentioned, i removed the |
This issue is being filed as a part of the requested PING horizontal review
I wasn't able to find any guidance in the spec about which fonts should have which minimum patch sizes. This seems like a regression from the spec when it was previously reviewed, when the spec mentioned which code points required a minimum level of obfuscation (though minimum patch size), and what that that minimum patch size was.
Have I missed this in the current spec, and the current version of the spec does require equivalent protections, just phrased or encoded differently? Are the previously included protections no longer needed bc of other, new protections? Or is this a regression (and if so, intended or unintended)?
The text was updated successfully, but these errors were encountered: