Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy guidelines for ECMA402 APIs #443

Open
zbraniecki opened this issue May 21, 2020 · 8 comments
Open

Privacy guidelines for ECMA402 APIs #443

zbraniecki opened this issue May 21, 2020 · 8 comments
Labels
c: meta Component: intl-wide issues s: help wanted Status: help wanted; needs proposal champion

Comments

@zbraniecki
Copy link
Member

Originally triggered by #435 and in relation to #442, this issue is about designing a consensus on how our work interacts with privacy concerns on the Web.

Since JS is the language of choice of the Web ecosystem, the standard library and ECMA402 with it, bring with it a potential for malicious actors on the Web abusing the APIs against the user.

It is unclear to me what are the best practices we can use to ensure that as we design ECMA402 APIs we account for that and we make it easier for implementers to protect the user against such abuses.

I'll loop in several privacy experts to get their perspective and if possible the evaluation of the current API surface and planned APIs.

My hope is that in the result of this issue we will end up with basic guidelines from the privacy experts for the ECMA402 group that we can use when working on future APIs.

@sffc sffc added c: meta Component: intl-wide issues s: discuss Status: TG2 must discuss to move forward labels May 22, 2020
@zbraniecki
Copy link
Member Author

zbraniecki commented May 22, 2020

History

Historically, we designed the Intl API to be best-effort and intended to have its output treated as opaque.
One reason for that is that data may differ depending on the implementation and we didn't want people to expect DateTimeFormat.format() to return the exact string when the string can be different between two different JS engines and even two different versions of the JS engine.

The other was to try to bring some opaqueness to what data is available. Since data may differ between implementations, the Intl API can create a wide surface for fingerprinting by allowing attacker to read available data tables, locales, and detect version of data tables used by the browser.

We also constructed a bit more complex way to read avialable locales - instead of exposing availableLocales, we expose supportedLocalesOf - the intent was to make the user pass locales they're interested in and learn which of those the engine supports.
This was done so that the white hat use is easy, while trying to enumerate all supported locales is costly.

The idea behind it was not that we'll somehow prevent the abuse, but that we'll make it CPU costly and thus easier to detect and block by the engine (in a sort of DoS firewall fashion).

I am not sure if that is the right way to approach designing privacy friendly APIs and I do not know if we are successful in limiting the user-unfriendly use of our APIs.

Today

As we extend the surface of the Intl API we are encountering more cases where the knowledge of what the engine support may be important for designing UIs that use our APIs.

Example such as pickers would benefit from knowing what locales, timezones, calendars, units etc. are supported. In other cases, the user may want to check what the engine supports and consider loading additional data or library for cases they want to cover and the user engine does not.

Near future

We expect two areas of ECMA402 investigation that have potential privacy consequences:
a) Supplemental API extensions to help user understand what the engine supports
b) Asynchronous APIs that allow for lazy over-the-air data loading
c) Improve integration of user preferences into Internationalization

a) Supplemental API extensions to help user understand what the engine supports

There's a number of APIs in consideration intended to provide insight into what data is available in the engine. #435 is an example, but there are others related to supported units, supported calendars, numbering systems, currencies and display names.

I'm not sure if the supportedXOf is the right paradigm, if it helps at all, and how to approach the need for such API against its fingerprinting nature.

b) Asynchronous APIs that allow for lazy over-the-air data loading

One consequence of the growth of the ECMA402 scope is that we are increasing the amount of data required for the engine to carry in order to support ECMA402.
The aftermentioned opaqueness of data and best-effort model helps us avoid locking down any implementer, but it still is a concern and we'd like to better fine tune what data we carry especially in low-resource scenarios.

That means that we'd like to invest in making engines capable of carrying only subset of the data and enable them to load additional data on-demand.

That smells like another privacy issue. It creates a scenario where the attacker can probe the API to learn that the engine carries Khmer data and since "by default" the engine doesn't bundle it, deduct that the user must have been visiting Khmer websites.

When presented with the idea, @annevk responded:

Please don’t add new user agent wide state caches

c) Improve integration of user preferences into Internationalization

Current Intl API is using very limited information about the user preference (language, script, region) to determine what is the "default" locale the user wants. The application can provide an override to each constructor, which allows the application to have some sort of "language picker" and feed the user selected locale.
If that is not the case, by default each constructor will use "DefaultLocale" from the JS environment which matches what the user requested in "navigator.languages".

The default mechanism does not allow user to specify any "overrides" like - asking for different hourCycle, or calendar system, or preferred start of the week. The custom argument from the app, can pass that information using unicode extension keys. For example:

// User wants Austrian German with H12 hour cycle
let dtf = new Intl.DateTimeFormat("de-AT-u-hc-h12");

But since most web apps don't provide sophisticated user preferences UIs, most apps will just do:

// Give me the formatter for the default user locale
let dtd = new Intl.DateTimeFormat();

In those scenarios (majority), if the user wants any override, they'd most likely customized their operating system preferences to the preferred hour cycle, start of the week, number grouping model etc.

In such case, users often reach out to engines asking them to read this information from the OS and feed it to the default Intl APIs so that they can reflect user preferences better.
Unfortunately, such behavior is yet another fingerprinting vector allowing the attacker to learn about a high number of user preferences that can be overriden in the operating system.

@zbraniecki
Copy link
Member Author

zbraniecki commented May 22, 2020

I'd like to call for help from the Privacy and Anti-Tracking community to help us make Intl API surface friendly to anti-tracking and user privacy efforts.

@zbraniecki
Copy link
Member Author

zbraniecki commented May 22, 2020

HTML widgets

There's one more measure we try to use that somewhat may be helpful in limiting the surface.
Many enumeration use cases, and user preferences options, are mostly evident in picker widgets.

For some of them, we developed native pickers - date, time, calendar etc. which allow the browser to handle selection using non-web-exposed information and communicate to the app just the result of the selection.

It is our hope that such efforts limit the exposure of fingerprinting bits and importance of wide API surface required to implement those pickers in JS.

Unfortunately, it is my understanding that even in those cases, the widgets can be used for fingerprinting because the attacker can somehow read the dimensions of the UA widget and deduct information about locale used to create it (since the locale information may affect the UA widget dimensions).
For that reason, I was advised to make the native pickers use the navigator.languages locale rather than browser locale, and those pickers don't have access to operating system intl overrides.

If this is not true, I'd love to change that since we get a lot of user reports about their OS preferences not being respected by the native pickers in Firefox.

@annevk
Copy link
Member

annevk commented May 26, 2020

@zbraniecki it might help to split the various questions so they each have their own smaller issue thread. Requiring everyone to read all of the above just to discuss HTML widgets for instance is a lot.

Two thoughts on HTML form controls:

  1. A lot of web developers want these controls to match the language of the page and would avoid using native controls if they don't. So there are some tradeoffs there. Letting the user override seems reasonable, but not sure it should be the default.
  2. It seems making the sizing somewhat more locale-independent could be doable.

@zbraniecki
Copy link
Member Author

@zbraniecki it might help to split the various questions so they each have their own smaller issue thread.

Makes sense! I filed https://github.com/FrankYFTang/proposal-intl-enumeration/issues/3 and will fill separate issues for each.
How should I CC the right people into those issues?

@annevk
Copy link
Member

annevk commented May 26, 2020

At-mention them?

Speaking of which, @npdoty might be able to help with some of this as well.

@zbraniecki
Copy link
Member Author

At-mention them?

It would be helpful to have a list of people to At-mention in those issues. I'm trying to accumulate basic intro for them here and the list of those people.

@zbraniecki
Copy link
Member Author

#409 is definitely a good candidate for such review and guidelines.

@sffc sffc added s: help wanted Status: help wanted; needs proposal champion and removed s: discuss Status: TG2 must discuss to move forward labels Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: meta Component: intl-wide issues s: help wanted Status: help wanted; needs proposal champion
Projects
None yet
Development

No branches or pull requests

3 participants