Skip to content

Latest commit

 

History

History
331 lines (174 loc) · 28.4 KB

notes-2023-06-01.md

File metadata and controls

331 lines (174 loc) · 28.4 KB

2023-06-01 ECMA-402 Meeting

Logistics

Attendees

  • Shane Carr - Google i18n (SFC), Co-Moderator
  • Ujjwal Sharma - Igalia (USA), Co-Moderator
  • Ben Allen - Igalia (BAN)
  • Frank Yung-Fong Tang - Google i18n, V8 (FYT)
  • Louis-Aimé de Fouquières - Invited Expert (LAF)
  • Richard Gibson - OpenJS Foundation (RGN)
  • Daniel Minor - Mozilla (DLM)
  • Eemeli Aro - Mozilla (EAO)
  • Justin Grant - Invited Expert for Temporal (JGT)
  • Sean Burke - Mozilla (SBE)
  • Yusuke Suzuki - Apple (YSZ)

Standing items

Status Updates

Editor's Update

RGN: Landed the big change to align with UTS 35 (non-terminal references and use of vocabulary).

MessageFormat Working Group

EAO: Progress, limited by bandwidth.

Proposal Status Changes

https://github.com/tc39/ecma402/wiki/Proposal-and-PR-Progress-Tracking

Intl.DurationFormat

FYT: The latest spec change improved a lot; just need to find time to implement.

DLM: Anba was waiting on spec updates; I can follow up to check the status.

USA: Small TG1 backlog with DurationFormat, presented last TG1, optimistic tone. Need to go back to TG1 for approval on one normative revert related to fractionalDigits defaults.

SFC: There's no core ICU4C implementation of DurationFormat. We could get this expedited in ICU4X if there's an implementation onboard with taking that approach.

Intl.Segmenter

DLM: SM team is waiting on layout team. Acknowledge that it's becoming a web compatibility problem.

Pull Requests

Normative: Added support for sentence break suppressions to Intl.Segmenter #783

#783

YSZ: I'd like to ensure this "ss" thing is only a string-related thing, not related to layout. I'll loop back with my team.

FYT: Also we probably need to change Appendix A and B.

Normative: raised maximum fractional digits from 20 to 100 #786

#786

SFC: This was previously discussed in 2021-07. Number.prototype.toFixed has 100, so it makes sense for us to also use 100.

FYT: Is this implementable?

SFC: It's just another 80 digits on top of the 300+ digits in NFv3

USA: Formatting cryptocurrencies seems like a use case.

FYT: Should we change significant digits, too?

USA: The consistency argument doesn't fly for significant digits.

Conclusion

TG2 approval

Normative: Allow UTC offset time zones #788

#788

SBE: I'm not familiar enough with the spec to comment on it, but does this provide something that the IANA aliases, Etc/, don't cover?

RGN: Yes, because those are only for large discrete chunks.

JGT: There's also an interop in that it's valid according to the spec to have an offset string for a time zone, so we should be able to support such strings.

RGN: It’s not necessary to store what the ECMAScript code provided

FYT: Up until this point, we’ve had a fixed number of timezones – less than 600. With this change we have a much larger number of timezones, in particular 99.999% are useless. Who cares about one minute from UTC? In Temporal, I can use a very small number to point to the index of a supported timezone. Now I can not.

RGN: There’s a different code-path for offset time zones.

FYT: But is this really necessary? Who are using it? We’re opening a can of worms that no one intends to consume. It’s overengineered – it’s flexible for nobody.

SFC: UTS 35 does have a well-defined way to format arbitrary offset strings, usually with a format like GMT+7. It’s not like we’re adding a large set of things that we couldn’t do before – there is a specific algorithm that requires no locale data.

RGN: This can only store a named offset from the IANA database. At format time it looks for that offset value. This doesn’t get changed by the pull request

...

JGT: To respond to Shane, is it acceptable to use the colons and dots in the localized string? Can we say GMT+7:30.001?

EAO: I’m with Shane in that I’m having a hard time figuring out why this has popular appeal – who is really going to use this, and for what? Maybe there is a use, but I haven’t heard it yet.

SFC: The display is up to the locale data, and I believe there is locale data on what separators to use, and there is a well-defined algorithm. In Temporal, we discussed offsetNanoseconds and decided to go to a lower level of precision than that. Are we still using nanoseconds, or could we stop at just regular seconds?

JGT: I agree that I am not sure that nanoseconds are the necessary precision here. We could raise that in Temporal to see if it can be relaxed. In IANA the most granular offset is in the hundredths of a second. If we could limit it to this, it might not save us storage to go below that, but it’s a reasonable thing to go into at the next Temporal champions meeting, particularly if it’s an implementation issue that could cause pain for implementors. We had discussed the possibility of use cases where it would be necessary, but we didn’t determine what those use cases were.

FYT: I don’t believe this has anything to do with IANA – people using IANA would use this thing. The question is whoever can’t use IANA, what thing do they need? I can understand people wanting it down to the hour, or even minute, but anything below seconds for non-IANA case doesn’t seem useful. Show me a use case for anything below minutes.

RGN: The use case is interoperability. It’s supported by IETF format, it’s supported by ECMA-262.

FYT: 262, where?

RGN: DefaultTimeZone

FYT: From the system?

RGN: Yes

FYT: And if the system doesn’t return it, we don’t have conflict.

RGN: Yes, but the system is currently allowed to return it.

FYT: We could change there instead of here.

RGN: Yes, but we cannot change the IETF format.

FYT: So IETF can say the timezone down to the nanosecond.

RGN: I believe yes

SFC: Looks like we currently support second precision. We could extend that to hundredth of a second. Nanosecond, though, would mean making our storage bigger, which isn’t likely something we want to do for such a fundamental type.

JGT: One thing I recommend is splitting the feedback. There are concerns around resolution, and then concerns around should we provide this capability at all. It would be great if we could give a thumbs-up thumbs-down on the thing itself, with a caveat on the needs of implementors. Note: a lot of data sources for IANA don’t go back to past dates. For example, there has been a lot of merging happening in the database. I believe ICU merges the names, but not the data. ICU does not pull in the pre-1970 offset data for time zones, which is a case where it might be needed. There are odd use cases for this but they do exist. It is interoperable, it’s supported by Java, which in turn drove the IETF spec for backwards compatibility. Those things exist, and it would be a good thing to be interoperable here.

FYT: Yes, it’s used, but is it used below minutes? Is there a case where it needs to be more precise than minutes?

RGN: The Netherlands offset was defined out to hundredths of seconds before 1909.

JGT: But is the overall idea a good one?

SFC: I support the pull request. It’s a useful feature, there are real use cases for GMT style timezones and renderings of them, and this seems to be a big gap in what Intl supports. For both of those reasons I think it should land once we answer the precision question.

RGN: I’m willing to tweak precision as necessary. The biggest thing for me is getting this in so that we don’t have the inconsistency between 262 and 402.

SFC: Any questions about the spirit of the proposal, putting aside questions of precision?

LAF: +1

EAO: I continue to agree with FYT – minutes makes sense, but I have a hard time understanding why we need more than that. I don’t believe that anyone uses the historical ones with higher-precision offsets.

RGN: Comment on interoperability point: IETF draft has essentially arbitrary precision. It’s up to TC39 to pick a cutoff. Currently nanoseconds in 262, but that is subject to change. If we decided we wanted seconds, we could do that.

SFC: Sounds like the conclusion here is that we’re not convinced as a group on the precision we should be using, but assuming that we have an appropriately agreeable precision – maybe minutes, maybe lower, not as low as nanoseconds – we can agree on the pull request?

Editorial: Added note about sets of locales for web browser implementations needing to not change as a result of user behaviour #780

#780

DLM: I’m not familiar with this issue, but I can get feedback from people on our team.

YSZ: I have a question… we're not doing this right now but we may choose to behave differently based on the options. For example, Apple has a lockdown mode for enhanced security. Some of the features are explicitly disabled, which is a fingerprinting vector. My question is, for this statement, is it considering explicit option of the browser or engine as an allowance of this paragraph? Right now, it’s saying the same version of the same version of the same host on the same platform, but my understanding is that it is not including something like lockdown mode.

EAO: I think it’s arguable that in that case the platform has changed.

YSZ: Right now lockdown mode is system-wide – it changes basically everything. So as you said, we can say that this is a different platform.

SFC: Let’s get another round of feedback from the Apple and Mozilla side on this pull request, and any changes we can make to it in response to Addison’s feedback. It would be good to follow up on this. BAN has made this PR based on feedback from Apple and Mozilla.

Conclusions:

SFC: UTC offset timezones are good, but we need to talk more about precision Approve minimum/maximum fractionalDigits. YSK needs feedback on sentence break suppressions, and another round of feedback from Apple and Mozilla is needed for #780

Revisiting Normative: Allow UTC offset time zones #788

#788

FYT: In the IETF spec, the time zone precision is to the minute, not even to the second. We figured out that there are hours and minutes, but nothing below seconds. That is not, though, the case in 262 – only in IETF.

RGN: I am going to open a normative PR against 262 on this today.

SFC: Does that require changes to Temporal?

RGN: It won’t, because Temporal is going to leave that part of 262 unaltered. We’ll discuss in Temporal, though.

SFC: Seems like we’ll be able to get this into 16 bits, which is nice.

RGN: The question was whether it supports offset timezones beyond the limitations of IETF, but that decision will be made inside the Temporal group.

SBE: I’m kind of wondering if – this is related – iCalendar allows for UTC offsets down to the second, but iCalendar also provides display names for time zones. Basically, trying to figure out the extent of defining a timezone that may not necessarily be part of default IANA timezone set.

RGN: Not sure I understand the question, but: the offset timezones are disjoint from the IANA timezones

FYT: Could you open a new issue and we could put it on the agenda? We’ll need to look into the details.

RGN: If iCalendar has a use case for offset precision below minutes, we wouldn’t want the needs of iCalendar to be unsatisfied by the new format.

EAO: It might be nice to include any available indications of using such offsets with seconds other than 00. Although the standard supports it, I doubt whether it’s actually being used.

FYT: I think previously SBE is talking about "4.3.14 UTC Offset" in https://www.ietf.org/rfc/rfc2445.html#section-4.3.14

SBE: 2445 is superseded, but that part isn't meaningfully different in the replacement: https://www.rfc-editor.org/rfc/rfc5545#section-3.3.14

Proposals and Discussion Topics

How to prevent misuse of localized strings?

https://docs.google.com/presentation/d/1KuIOSDQRliqCT3x3WX9Bg9H3hndfhAcH2G0aNoBKq18/edit

DLM: Part of this is how to prevent misuse in the future, but part of this is going to CLDR and being more transparent in the decision-making process. I think they need to be more careful, especially in the en-US locale. Although homedepot.com is fixed, we're aware of at least one more site, and there are probably more. TC39 takes web compat very seriously, and we should make sure we have alignment from CLDR on that as well.

EAO: This brings to mind Addison's proposal for a localized string that doesn't allow arbitrary introspection into the string. The second thing is that we could introduce a new locale that users could rely on for more stability.

SBE: Hiram's law is at fault here. If we try to ensure that nobody ever breaks because of a CLDR update, it's virtually impossible. Long-term, trying to maintain differences from the standard in order to prevent breakages is not a temple position. But making that less observable goes some distance.

DLM: For the specific issue of NNBSP, I'd be in favor of the specific workaround of an English-only fix.

JGT: I'd not overreact to this, because it's rare to change the datetime format for English. It would be useful to make a loud announcement when changing data in the top 10 locales so consumers can be aware.

SFC: My conclusion here is twofold. First, as stewards of i18n on the Web platform, we have a duty to educate developers to do the right thing. Second, we should engage more with CLDR to be more cognizant of compatibility concerns. Thank you.

Preventing misuse of localized strings (alternate version of notes)

SFC: homedepot.com had some issues last year with CLDR updates. They had code that, for example, did conversions by searching by regexes and manually changing the strings to 24 hour format. JS lets developers do this, but several things go wrong here:

Parsing localized strings is wrong Should have been using a library like moment.js We have formatToParts(), and if they really wanted to do this themselves they could have avoided parsing. If they must write this code, choose “und” as the locale.

What made this case different? It’s important to be very clear here that we update CLDR data every six months. The behavior of the web platform changes every six months, which has been an understanding since the original version of the Intl spec that this happens and should be expected. Why did this change break things? The crucial difference is that this code is based on English data, and a lot of websites use English data as the canonical thing, and make assumptions that are much less likely to be true in other languages. There are major changes that happen more often in other locales. “Misuse” means using localized strings and then doing additional processing on them. If you use the format() function to get these strings, you should display them to the user and do nothing else with them. This causes problems for us, and it also causes problems for downstream users who aren’t getting the right result, and they also cause problems with libraries that aren’t using the tips of CLDR.

Given this evidence, the main thing I want to present to you is “how do we prevent misuse?”

  1. This is a motivation for Temporal – if we had a better standard library, this case – taking localized strings and doing things to them to do things that are otherwise unsupported. A robust library would prevent this.

  2. The other is better documentation. Very bad advice comes out of (for example) stackoverflow, bad advice motivated by not understanding that locales can change on a regular basis.

  3. Public relations push? Get articles on Hacker News?

  4. Fuzz testing – this is similar to how the Home Depot issue was discovered.

Encourage broader testing with canary. Can we get more developers to use canary, and roll out CLDR updates only after incubating in canary for N months. This gives developers opportunities to find problems that result from misuse of localized strings.

How should the ecosystem fix this issue?

What is the long-term solution?

Keep this hack in perpetuity? This is what happens if we don’t do anything. Remove the hack in an upcoming release? Update to CLDR data. I think Home Depot fixed this problem, but there are likely others we haven’t seen. Is this risk sufficiently mitigated? Make the change only in en/en-US and not change elsewhere. CLDR is considering this. As the web platform this is a direction we could go – it could solve the Home Depot problem, it could solve other problems. Even en-CA isnt as misused as en and en-US. If we make the change just there, it can limit the scope of the problem.

DLM: First, I 100% agree that people shouldn’t do this, so I’m sympathetic to trying to educate people away from doing this sort of thing. I’m a little skeptical, though, since people are always going to make these types of mistakes or misuses. We see this coming up even in Mozilla from developers not familiar with i18n. This is partially a question and partially a comment: how does the CLDR committee make these decisions? Sure, to prevent this mistake in the future. I was wondering if someone could give a quick update on how the decision-making process works on the CLDR side. I’ve reviewed the ICU update that brought this into Firefox, and there was frustration from our web compatibility folks about why it changed. But for en-US locale we can’t be 100% certain that people aren’t going to rely on it. There have been other sites with problems like the Home Depot issue. In TC39 we stress web compatibility and are willing to tolerate suboptimal things to maintain compatibility, and I’m wondering if CLDR might make similar decisions.

EAO: Two thoughts. First I’m reminded of a proposal that Addison Philips brought a while back about localizable strings. Is it possible to start migrating to something that looks a little bit like a string, but with information about directionality, script, and so forth, but is really just a blob. From the outside, make it more difficult to just parse the string you get out of it. Second part is that I wonder if ther’s space for us to implicity find a different locale than en-US to act as a default, and that we could get CLDR to be more strict about making changes like this. A locale – something like International English – that could be very specifically defined and could be reliable for the case where people still would do [garbled]

SBE: Part of what EAO was saying does provide a solution to what I was going to say, which is: there’s Hiram’s law, which is that any observable part of your API is going to be depended upon by users. I think that there are two directions that this takes us: one is that if we try to ensure that nobody every breaks because of a CLDR update, that’s going to be nearly impossible as long as they have insight into those particular details, or CLDR stops changing. Long-term, trying to maintain differences from the standard in order to prevent breakages is not a tenable position. I do think, though, that what EAO was saying about making that less observable goes some distance toward improving that situation.

DLM: I would be willing to remove it, for the middle ground you propose, and ideally for everything.

JGT: I think in some ways this is a little bit of a worst-case, because it affected English, it was about characters you couldn’t see – no one would catch it in development because a space and an nbsp look the same. I think, though, that we shouldn’t overreact to that, because this happens very rarely. Also, one of the problems was a communication problem from CLDR to ICU to node. Node did a minor release that in theory shouldn’t contain any breaking changes at all, and it would be nice if ICU would have a big loud note about changes so that consumers can be aware of that.

SFC: I like the meta-approach that we should coordinate with CLDR on that. I know Matias from the V8 team has started attending CLDR calls from time to time, largely in response to this issue. Hopefully that will make some steps toward preventing this in the future. I do think that as stewards of the web platform for internationalization, we have a responsibility to make this point, because we have a role in pushing the ecosystem in this direction. Whenever we talk about this, we should stress fuzz testing and not introspecting strings that come out of Intl APIs. Down the road we could use Addison Philips’s string API to actually prevent this, but until then encouraging testing should be a thing we do in this group. We’ll see if the change for en-US lands, and then hopefully that will get us closer to re-converging again on CLDR.

Use Etc/Unknown (like ICU does) to handle the case when the OS's time zone is unrecognized by ECMAScript #25

tc39/proposal-canonical-tz#25

JGT: When ECMAScript doesn’t know what the time zone is, what should we do? Good example of what could cause this: the kiev/kyiv change, the ciudad de juarez change that happened recently. ICU defines this IANA-type string, Etc/Unknown, for this case. If you call Intl.DateTimeFormat.resolvedOptions() if there’s no time zone, you get Etc/Unknown back, but you can’t actually use this. Non-Chromium implementations simply return UTC in this case – and Etc/Unknown works exactly the same. Etc/Unknown is essentially a guarantee that all time zone calculations are going to be wrong. In the PR I laid out a few choices of what we could do about it . One choice is to do what non-Chromium implementations: return UTC. Another is to like Chromium return Etc/Unknown, a third choice is to return Etc/Unknown and allow it as input. Another is to throw when there’s an unknown time zone. The goal of the resolution should be to clarify both to programmers and end users what’s going on, so that they can fix the problem (generally by updating the ECMAScript implementation). I want to stop and see what folks think about each of these four, or something that’s not listed here, could be a good solution. Thoughts?

SBE: #1 is not something that could be used in a calendar context, where you do have to known. In iCalendar we fall back to time zones as defined in iCalendar if we can’t find the IANA timezone, which means we need to know if the timezone we get back is not a real time zone. We want to behave differently if we get back unknown instead of just defaulting to getting something wrong.

JGT: I share the same concern.

YSK: I have some concerns about using Etc/Unknown. From the issue we already know that this timezone can not be used by other tools like python jabber and c++ chrono library. This is not a time zone id that is not stated in IANA database, so basically it’s an invalid ID. If we keep using UTC, from the user’s perspective keeping it working is at least better, because at any time a new ID can be added. It is not great if the system is carrying this new id and the the system starts doing something very wrong. Still working with UTC would be preferable for users. We do not have enough way to know that we are using an unknown timezone or not, and I wonder if we should have a different interface, or a different way, to know whether this is an unknown thing or not.

JGT: I’ve been thinking about this is that this is clearly – if you don’t know what the time zone is, and you call an API that needs the timezone, the output is guaranteed to be wrong (unless you happen to have a zero offset to UTC in your timezone), so for most of the world any result will be wrong. Whether we handle this wrongness by showing an invalid local time or throwing an exception. If you’re providing anything with Etc/Unknown, you’re already providing bad data. I agree with SBE’s concerns, though I’m worried about the complexity of providing another API. This feels like a case where falling loudly and obviously is the best thing we can do to help programmers figure out how to do.

EAO: Are there steps documented for replicating the current bad behavior for themselves?

JGT: I don’t know if you can do it in userland. The CLI for setting timezone in MacOS won’t allow you to do it.

EAO: Doing it in a docker container might be possible?

JGT: I think [Anba] has done that, and he might be the person to ask. I think it may be as easy as passing the TZ environment value.

SFC: My question is: I’m wondering, as another approach, if there’s a way for ECMAScript to require the platform to give us not only an IANA timezone but also the current offset. If we get a timezone that we don’t understand, we can fall back to the offset timezone.

JGT: interesting idea! My only concern is that might be much harder for people to understand what’s going wrong. I’ve looked at bug reports that people filed in 2022 when Europe/Kyiv showed up. That’s an easy string to search for on the web. If you show offset strings, it prevents programs from failing but also prevents people from finding out what the problem is and fixing it. Not sure which is better.

SFC: Clarifying question: if people google the string, what is the action they can take? Should they install the latest updates?

JGT: That’s the only sensible thing to recommend to users. Because browsers are evergreen, it’s a reminder to the user that the red dot in the corner in Chrome, you should pay attention to that. The other case is organizational users, so they can ping I.T.. Other is apps that bundle tz info, as in electron apps.

SFC: My main concern is that this is not something that developers can fix, unless they’re developers of an Electron app. Most users, if they see Etc/Unknown, they’re not going to know what it is and they’re going to ignore it. It’s not helpful information to users. Users will eventually update their browser, and then the problem is fixed. If it were a developer problem, I would be much more receptive to the idea that we should give something that’s easy to find and fix, but if this is a user problem I lean toward the side of making the best effort – getting the offset timezone and displaying that.

JGT: Because offsets changed because of DST, if you ask the OS what the offset is, the only way to know that is to know what date you’re looking for the offset for.

SFC: Maybe “current offset”

JGT: at the current moment, yes. But that’s different than asking for the current timezone. We may have cascading inaccuracies that are hard to troubleshoot.

YSK: I would like to add another thing. The case I’m thinking is that this string is shared is a problem. Any browser can see the time zone data that is not used in ICU – if it’s a super new ICU and the browser doesn’t have this yet, what we see is unknown id. The UTC case the webpage is working but the timezone is different. This is not so great, but it’s better than making a webpage completely break as in Home Depot’s case with DurationFormat. What I would like to see is a deterrence – it’s nice that a programmer can see, but this type of webpage breakage is affecting users. It’s incorrect to make things broken or to send errors or other things, it’s not good for users. It’s good for developers, but there are other ways we can benefit the user.

JGT: I think that the thing I would be a little concerned about is. Well, okay, in the home depot case it broke, but there was nothing the user could see if it rendered inaccurately. Because this changes the data, I’d like to have some way to alert users of the problem (consider a calendar app that silently gives the wrong time). I would support some way to let developers know. The action here is to pop up something to the end user to let them decide what to do.

EAO: The talk about whether it’s a developer problem or a user problem – isn’t this really an implementation problem? The browser or other environment that doesn’t understand what the system is saying – isn’t it on that instead of the JavaScript spec?

JGT: To YSK’s point, on Apple’s platforms this is no issue because the browser and OS share one ICU. The Chromium implementation embeds the ICU database, so with each version you get the same results.

SFC: It doesn’t seem like we have a firm conclusion, but we have at least identified the people we need to follow up with. YSK, myself, EAO, SBE.

JGT: This is useful, though it’s sad that we can’t figure out what to do yet.