Intl.DateTimeFormat needs a parser #342

anilanar · 2019-04-25T13:11:43Z

An instance of DateTimeFormat can do Date -> string but cannot do the reverse string -> Date for formatted strings it created.

It's very hard to implement that in user-land because different browsers might handle different languages in different ways?

My proposal is a parse method that is reverse of the format method:

const x = new Intl.DateTimeFormat(/* format options */);

// true, assuming format options are lossless
x.parse(x.format(aDate)).getTime() === aDate.getTime()

The text was updated successfully, but these errors were encountered:

littledan · 2019-04-25T14:07:09Z

I believe this has been raised on other threads. Although ICU and many other libraries support localized date parsing, it's a bit brittle and not quite recommended. If someone has a free-form text input field, what they write might not be parseable by just trying to match what Intl.DateTimeFormat would output. For that reason, I'd encourage developers to make a structured input field, and to develop solutions to this harder problem in a higher-level library.

zbraniecki · 2019-04-25T14:22:05Z

If all you want is to get a field out of a formatted date, you could use formatToParts for that. I agree with Daniel that parsing localized i18n input is very brittle.

leobalter · 2019-04-25T14:27:32Z

a work for a parser should be coordinated with the Temporal proposal. I'm not in favor of creating a new parser otherwise neither jump ahead with a Date like parsing on a such modern API.

anilanar · 2019-04-25T17:09:57Z

If someone has a free-form text input field, what they write might not be parseable by just trying to match what Intl.DateTimeFormat would output

A parser may not be able to parse a given string. That's the case for all parsers. If it fails to parse, it can throw an error with failure reason or return null or do whatever is favored by ES committee nowadays.

I'm not sure what is brittle here. I think DateTimeFormat has all the information necessary to parse what format produces. I'm not proposing a parser that tries to handle all possible formatting options. I'm asking for a parser that works with very specific formatting options plus a locale.

I'd encourage developers to make a structured input field

I find it very hard to get to know all possible ways of formatting dates across the globe (e.g. 12-31-2018 in US, 31.12.2018 in EU etc.) and implement a structured input element that can handle all. In addition, locale is usually not enough to decide date formatting options. For example, my locale is en-US but my date format (at OS level) is configured as dd.MM instead of US style. Perhaps Intl.DateTimeFormat can tell us more about the format itself for user-land to create a parser based on it.

For example:

C# has DateTime.Parse that takes a CultureInfo object, which is equivalent to locale + formatting options.
Joda library for Java has DateTimeFormatter that is a printer and a parser at the same time, similar to what I propose.
Apple's Foundation API (their std lib) has NSDateFormatter that is configured with either a locale identifier or a pattern and is able to parse strings matching its configuration.

littledan · 2019-04-25T17:31:40Z

Note, Temporal does have a parser for ISO 8601. What I'm skeptical of is parsing human-input date-times, which seems intractable for a library as low-level/deterministic/data-driven as Intl.

I'm aware that many other date libraries have such a parser, and I think it's a mistake. Actually, V8 had an extension to ECMA-402 which included parsing in Intl.DateTimeFormat, and I removed it.

leobalter · 2019-04-25T18:15:07Z

Note, Temporal does have a parser for ISO 8601. What I'm skeptical of is parsing human-input date-times, which seems intractable for a library as low-level/deterministic/data-driven as Intl.

This is exactly why I told this work should be coordinated. I'm against us creating the something with the same goal in two different places using different implementations. Date.parse is already an epitome of most confusion using JS.

rxaviers · 2019-04-25T20:13:47Z

I echo @littledan #342 (comment) and like @zbraniecki said, formatToParts can be used to generate a good datepicker.

aphillips · 2019-04-25T20:56:31Z

While I agree with all the arguments against parsing human entered date strings, there may be a tiny amount of value in parsing patterned date values (e.g. Using picture strings or [less likely] skeletons). 8601 is a good example handled elsewhere, but there are non standard but machine generated formats, sometimes localized, for which having calendar-aware machinery for parsing is occasionally useful. For me this has mainly been reading text based flat file formats.

I could probably count on one hand the number of times this has been useful in my career: I've exerted way more effort avoiding this sort of parsing. But to me that would be the use case.

leobalter · 2019-04-25T22:05:13Z

I'm not against a new Date parser, I'm only asking to follow up with other similar work that has been done to avoid more consistency in a filed we already have had enough (from Date.parse). Perhaps, we might end up with more than one method here and there, but consistency is ultimately required.

littledan · 2019-04-25T22:11:56Z

@aphillips, How have you parsed these in the past? Can you give more detail to, "sometimes localized"?

sffc · 2019-04-25T22:15:01Z

Having implemented code that does things like this in ICU, I can attest that parsing localized strings is indeed very brittle.

There are two main use cases for the parsing of localized input:

Strict: for example, to validate that text conforms to a given date format.
Lenient: for example, to make a best attempt at turning a user-supplied string into a date.

A strict-only parser is not too hard to implement, because you have a limited space of strings that could be considered valid. However, when users think of parsing, they are usually thinking in terms of the second use case. That is a much harder problem to solve correctly. For example, if someone writes "10-12-2019", is that October 12 or December 10? If you know the user's locale, you can make a guess, and that's what ICU does. However, I wouldn't trust that result without having the user verify the output. This is why if the goal is user input, in general it is still safer to just use a good off-the-shelf date picker.

anilanar · 2019-04-26T08:41:51Z

I’m not sure if I’m on the same page with some other attendants of this discussion.

I propose a strict only parser anyways. DateTimeFormat defines a strict format and I propose for it to have a parse function that would also be strict. To reiterate. parse is mathematically reverse of format when format options is lossless. It’s trivial to define isomorphic relation between Date and string when format options is lossy. So every adjective you use to characterize parse is also valid for format.

Why are we talking about Date.parse or other non-strict parsers anyways? They have nothing to do with what I’m proposing.

littledan · 2019-04-26T08:45:41Z

Could you say more about your use case where a strict parser is useful in applications?

aphillips · 2019-05-06T15:19:39Z

I agree with what @sffc mentions above: strict is easy to code, but it is hugely intolerant of any variations and it makes implementations sensitive to changes in CLDR data (what used to work, stops suddenly...) A strict parser is useful when you are both the generator and consumer of the resulting localized strings (in which case, ISO 8601 is right there and you ought to use it).

@littledan when I have done lenient parsing, it was to parse custom date patterns, generally using Java-based ICU DateFormat. For example, a long time ago, when I worked at webMethods (so at least 15-16 years ago), we needed code to parse various flat file formats which were in some obscure industry standard. I've also seen people attempting to parse date strings that were machine generated (yet with localized tokens--mostly month name/abbreviation). This code was always super-fiddley because the parser yakked on even trivial things--I recall writing custom error handling over the lenient parser.

@anilanar I realize that you are searching for round trip capability, but the fact is that you're always better off passing date values as values and only using display strings for display purposes. It's a poorly internationalized application that relies on being able to interpolate a display string back into a date. It would be nice to have a "mathematical reverse", but stuff like time zone IDs gets in the way. Ultimately, the question is: what application do you have for this, vs. mere "completeness" of the API?

rxaviers · 2019-05-08T11:14:03Z

On Globalize, we have a parser whose job is to perform an inverse operation of the formatter (it's strict). Its application is to parse user entered input in controlled UI (generated by formatToParts), e.g., https://github.com/rxaviers/react-date-input

rxaviers · 2019-05-08T11:40:42Z

If Ecma-402 doesn't provide a parser (at least a number parser), how would a user parse non "latin" numerals (e.g., eastern Arabic ٠١٢٣٤٥٦٧٨٩, full width digits ０１２３４５６７８９)?

rxaviers · 2019-05-08T20:46:28Z

I am reopening for feedback about the above

zbraniecki · 2019-05-08T20:59:29Z

how would a user parse non "latin" numerals

Why would we need to? If you need eastern arabic to western arabic numeral parser, then you should use a library for that, but I struggle to see it as a common use case. And if you need it, then likely you need different numeral systems as well.

zbraniecki · 2019-05-08T21:01:24Z

My other concern is that formatter can be lenient in the output and fallback on other numeral systems. But parser can't. You can't rely on any parser that may not have data for any numeral system.

sffc · 2019-05-08T22:35:36Z

how would a user parse non "latin" numerals

I could see us providing an API to expose character properties, exposing a subset of uchar.h. There are methods like u_isdigit and u_digit. That would be a different issue, though.

rxaviers · 2019-05-09T11:30:27Z

At PayPal, there are cases where Japanese users enter numerals using fullwidth characters (which caused bug in some products). Product developers weren't even aware of the numeric regional differences. The goal (in such case) was simply to parse user entered numerals.

Let me repeat what you're suggesting to make sure I understood it right. Product developers should handle the numerical mapping themselves (using a specific library for that). I can picture if-elses in that code do handle user entered numerals. That should be preferred instead of simply relying on Intl, an internationalization library, whose purpose is to drive away regional differences in the implementation.

Is that right?

rxaviers · 2019-05-09T11:41:27Z

My impression was that a parser method would just expose whatever is already present in the engine to handle localization aspects of https://www.w3.org/TR/html/sec-forms.html#number-state-typenumber

sffc · 2019-05-09T15:17:10Z

I can picture if-elses in that code do handle user entered numerals.

Number parsing (and date parsing) requires heuristics. UTS 35 does not have a well-defined algorithm for parsing numbers. Given that situation, it seems safer to put number parsing heuristics in user land. The alternative would be to essentially rely on "if-elses" in the ICU library, which is undesirable because (1) it is not well-specified and (2) the heuristics can change from release to release.

sffc · 2019-09-29T09:10:25Z

Unicode properties (e.g., whether a character is a digit) are being discussed in #90. This should expose data about Arabic numerals so that a parser can be written in user land.

Closing the issue again because it was re-opened a few posts earlier with a question specifically about Arabic numerals.

pixelbandito · 2021-03-01T19:12:05Z

I have some counterpoints and questions.
Please take these as a good-faith attempt to solve problems, I'd be more than happy to have guidance on better approaches.

Our use case:
Users select a language and region in the webapp - we don't use the browser setting. It's not ideal, but something we can't get away from easily.

We use native JS Intl functionality to display dates with the user's configured langage/region. That works well.
Our text inputs for dates always appear alongside a calendar picker - they're nice user experience addition that allows for quick copy/paste and date entry.

We'd be completely willing to require strict-ish string formats on text inputs for dates, but we'd still need to handle cases where 1 March 2021 is represented differently, e.g. France-French "01/03/2021" vs. US English "3/1/2021".

Counterpoints (reasons we think a parser would be invaluable):

We'd rather not include a third-party date parsing library.
We don't want to re-invent the wheel by maintaining our own list of which region/language combos put their months, days, and years in different orders.
The ideal implementation for us would be what the user sees (formatted) aligns with what the user can enter to a text field (to be parsed), which creates a coupling between the formatter and any parser we implement.
HTML's input type="date" would be a reasonable choice, but it's still not supported across some major browsers.
As far as I understand, we can't tell an HTML date input to use a locale and language other than the browser setting. (Please tell me I'm wrong!)

Handling of messy user input:
We could handle some common formatting issues, like trimming / collapsing whitespace before running date strings through the parser. We could even split non-numeric character and change the separators in case a user entered a separator other than "/".

Questions:
Given that, I don't understand why it's such a bad idea to provide a parser that's the inverse of the current formatter.
Is it about a strict parser being a bad idea for UX reasons?
Is it about a non-strict parser being impossible to implement effectively?
Is it about tricky cases like numeric character variations that make any strategy difficult to implement and maintain?

Forgive me if I'm ranting, I'd really like to hear others' thoughts on this, and any workarounds other folks have come up with.

zbraniecki · 2021-03-01T19:42:31Z

We'd rather not include a third-party date parsing library.

That seems like an a-priori preference that is not justified by anything in your comment. "We'd prefer not to include a userland library, so extend the standard" is not a strong position to take.

HTML's input type="date" would be a reasonable choice, but it's still not supported across some major browsers.

That's solvable with time and issues filed against browsers. Extending a spec in such a massive way would take years and I don't think you'd find browser versions that support the new functionality but not input type="date".

Given that, I don't understand why it's such a bad idea to provide a parser that's the inverse of the current formatter.

Because parsers are very very hard and very flaky and add disproportionally high maintenance, compatibility and security overhead for maintainers. In most cases that results with a small number of people happy about the solution, long tail of people unhappy about their case not being supported, and accruing bugs and problems that are perceived as a lowered quality standard library.

Forgive me if I'm ranting, I'd really like to hear others' thoughts on this, and any workarounds other folks have come up with.

Your comment is well phrased and expresses a genuine intent for your needs, I don't think there's anything wrong with that, but I appreciate your explicit description of intent and care about not come out as righteous :)
It's a subtle and complicated space, and date/time parser is a great example of something that intuitively feels like it should be relatively easy, but once you start digging into it you realize that it's an iceberg of problems for the standard library maintainers with many "tips" of that iceberg and for everyone "the right thing" is a different thing.

Finally, since data changes, if we support any internationalized date parsing, we will, by necessity, create a situation where your input to your website will work one day, but in the future the same input to the same website will break on the future update of the same browser because data has been updated and the parsing patterns changed.
This is super hairy and very very risky. We'd need to work a lot on maximizing the odds that web authors know how to work with that future incompatibility risk, and that's on top of all other issues I listed.

I hope my message conveys the "this is orders of magnitude more complicated than it looks on the surface" and would likely sink more resources and braintime from our group than everything else we work on combined, while still producing something that wouldn't satisfy majority of people that would like to see such API.

I believe user land library is a great solution. And if one becomes dominant and gets years of in-field experience, we could revisit this topic. But I think there's a reason that didn't happen yet.

pixelbandito · 2021-03-01T20:33:55Z

@zbraniecki Thank you for your response!

That's solvable with time and issues filed against browsers. Extending a spec in such a massive way would take years and I don't think you'd find browser versions that support the new functionality but not input type="date".

That's a really good point, and partially changes my view.

sffc · 2021-03-02T03:50:23Z

+1 on everything @zbraniecki said. Also, see my blog post on the subject:

https://blog.sffc.xyz/post/190943794505/why-you-should-not-parse-localized-strings

tounsoo · 2023-01-11T22:00:55Z

I'd really like this feature. We have input for date that is used internationally and we are currently relying on 3rd party library. I would love to see it from the Intl.

ryzokuken · 2023-01-12T14:08:42Z

@tounsoo please read the backlog. Parsing is out-of-scope for ECMA 402 and won't be included. A 3rd party library is indeed the right way to go.

tc39gh-424 is more comprehensive than tc39gh-342, which is specific to DateTimeFormat

gh-424 is more comprehensive than gh-342, which is specific to DateTimeFormat

littledan added the enhancement label Apr 25, 2019

sffc added c: datetime Component: dates, times, timezones s: comment Status: more info is needed to move forward labels Apr 25, 2019

anilanar closed this as completed May 8, 2019

rxaviers reopened this May 8, 2019

service-paradis mentioned this issue Jul 18, 2019

Buefy Datepicker typeable swaps month and day buefy/buefy#1562

Closed

sffc closed this as completed Sep 29, 2019

This was referenced Nov 14, 2019

Remove pattern string constructor from rust_icu_udat google/rust_icu#2

Open

Remove parsing from rust_icu_udat google/rust_icu#7

Open

candu mentioned this issue Feb 18, 2020

Dev / design updates: typography, colours, button styles, etc. CityofToronto/bdit_flashcrow#322

Merged

sffc mentioned this issue Apr 2, 2020

Explicitly state in the spec why we don't have parse methods #424

Closed

sffc mentioned this issue Oct 16, 2020

Parsing APIs #1

Closed

arndbeissner mentioned this issue Mar 26, 2021

DateInput widget not working with browsers set to date formats other than 'dd/mm/yyyy' or 'mm/dd/yyyy' dojo/widgets#1563

Closed

kamre mentioned this issue Jul 17, 2021

Bug: Date display is inconsistent mattermost-community/focalboard#715

Closed

ptomato mentioned this issue Jan 18, 2022

Cookbook: Parsing localized timestamps tc39/proposal-temporal#2004

Open

ptomato mentioned this issue Feb 14, 2022

Expand string parsing/formatting documentation, including a new FAQ tc39/proposal-temporal#2059

Merged

jiripudil mentioned this issue Jun 20, 2022

Add formatting API brick/date-time#61

Open

ptomato mentioned this issue Jan 11, 2023

Any way to get a date from a locale date string? tc39/proposals#454

Closed

FyiurAmron mentioned this issue Nov 6, 2023

table is not sorted by date FyiurAmron/sortablejs#47

Open

sffc mentioned this issue Jan 9, 2024

Intl API for parsing locale aware strings #850

Closed

gibson042 added a commit to gibson042/ecma402 that referenced this issue Mar 6, 2024

Editorial: Retarget the link explaining lack of parsers

2389c4c

tc39gh-424 is more comprehensive than tc39gh-342, which is specific to DateTimeFormat

gibson042 mentioned this issue Mar 6, 2024

Editorial: Retarget the link explaining lack of parsers #873

Merged

ryzokuken pushed a commit that referenced this issue Mar 7, 2024

Editorial: Retarget the link explaining lack of parsers

a1db456

gh-424 is more comprehensive than gh-342, which is specific to DateTimeFormat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intl.DateTimeFormat needs a parser #342

Intl.DateTimeFormat needs a parser #342

anilanar commented Apr 25, 2019 •

edited

Loading

littledan commented Apr 25, 2019

zbraniecki commented Apr 25, 2019

leobalter commented Apr 25, 2019

anilanar commented Apr 25, 2019 •

edited

Loading

littledan commented Apr 25, 2019

leobalter commented Apr 25, 2019

rxaviers commented Apr 25, 2019

aphillips commented Apr 25, 2019

leobalter commented Apr 25, 2019

littledan commented Apr 25, 2019

sffc commented Apr 25, 2019

anilanar commented Apr 26, 2019 •

edited

Loading

littledan commented Apr 26, 2019

aphillips commented May 6, 2019

rxaviers commented May 8, 2019

rxaviers commented May 8, 2019

rxaviers commented May 8, 2019

zbraniecki commented May 8, 2019

zbraniecki commented May 8, 2019

sffc commented May 8, 2019

rxaviers commented May 9, 2019

rxaviers commented May 9, 2019

sffc commented May 9, 2019

sffc commented Sep 29, 2019

pixelbandito commented Mar 1, 2021 •

edited

Loading

zbraniecki commented Mar 1, 2021

pixelbandito commented Mar 1, 2021

sffc commented Mar 2, 2021

tounsoo commented Jan 11, 2023

ryzokuken commented Jan 12, 2023

Intl.DateTimeFormat needs a parser #342

Intl.DateTimeFormat needs a parser #342

Comments

anilanar commented Apr 25, 2019 • edited Loading

littledan commented Apr 25, 2019

zbraniecki commented Apr 25, 2019

leobalter commented Apr 25, 2019

anilanar commented Apr 25, 2019 • edited Loading

littledan commented Apr 25, 2019

leobalter commented Apr 25, 2019

rxaviers commented Apr 25, 2019

aphillips commented Apr 25, 2019

leobalter commented Apr 25, 2019

littledan commented Apr 25, 2019

sffc commented Apr 25, 2019

anilanar commented Apr 26, 2019 • edited Loading

littledan commented Apr 26, 2019

aphillips commented May 6, 2019

rxaviers commented May 8, 2019

rxaviers commented May 8, 2019

rxaviers commented May 8, 2019

zbraniecki commented May 8, 2019

zbraniecki commented May 8, 2019

sffc commented May 8, 2019

rxaviers commented May 9, 2019

rxaviers commented May 9, 2019

sffc commented May 9, 2019

sffc commented Sep 29, 2019

pixelbandito commented Mar 1, 2021 • edited Loading

zbraniecki commented Mar 1, 2021

pixelbandito commented Mar 1, 2021

sffc commented Mar 2, 2021

tounsoo commented Jan 11, 2023

ryzokuken commented Jan 12, 2023

anilanar commented Apr 25, 2019 •

edited

Loading

anilanar commented Apr 25, 2019 •

edited

Loading

anilanar commented Apr 26, 2019 •

edited

Loading

pixelbandito commented Mar 1, 2021 •

edited

Loading