-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intl.DateTimeFormat needs a parser #342
Comments
I believe this has been raised on other threads. Although ICU and many other libraries support localized date parsing, it's a bit brittle and not quite recommended. If someone has a free-form text input field, what they write might not be parseable by just trying to match what |
If all you want is to get a field out of a formatted date, you could use |
a work for a parser should be coordinated with the Temporal proposal. I'm not in favor of creating a new parser otherwise neither jump ahead with a Date like parsing on a such modern API. |
A parser may not be able to parse a given string. That's the case for all parsers. If it fails to parse, it can throw an error with failure reason or return null or do whatever is favored by ES committee nowadays. I'm not sure what is brittle here. I think
I find it very hard to get to know all possible ways of formatting dates across the globe (e.g. 12-31-2018 in US, 31.12.2018 in EU etc.) and implement a structured input element that can handle all. In addition, locale is usually not enough to decide date formatting options. For example, my locale is For example:
|
Note, Temporal does have a parser for ISO 8601. What I'm skeptical of is parsing human-input date-times, which seems intractable for a library as low-level/deterministic/data-driven as Intl. I'm aware that many other date libraries have such a parser, and I think it's a mistake. Actually, V8 had an extension to ECMA-402 which included parsing in Intl.DateTimeFormat, and I removed it. |
This is exactly why I told this work should be coordinated. I'm against us creating the something with the same goal in two different places using different implementations. Date.parse is already an epitome of most confusion using JS. |
I echo @littledan #342 (comment) and like @zbraniecki said, |
While I agree with all the arguments against parsing human entered date strings, there may be a tiny amount of value in parsing patterned date values (e.g. Using picture strings or [less likely] skeletons). 8601 is a good example handled elsewhere, but there are non standard but machine generated formats, sometimes localized, for which having calendar-aware machinery for parsing is occasionally useful. For me this has mainly been reading text based flat file formats. I could probably count on one hand the number of times this has been useful in my career: I've exerted way more effort avoiding this sort of parsing. But to me that would be the use case. |
I'm not against a new Date parser, I'm only asking to follow up with other similar work that has been done to avoid more consistency in a filed we already have had enough (from Date.parse). Perhaps, we might end up with more than one method here and there, but consistency is ultimately required. |
@aphillips, How have you parsed these in the past? Can you give more detail to, "sometimes localized"? |
Having implemented code that does things like this in ICU, I can attest that parsing localized strings is indeed very brittle. There are two main use cases for the parsing of localized input:
A strict-only parser is not too hard to implement, because you have a limited space of strings that could be considered valid. However, when users think of parsing, they are usually thinking in terms of the second use case. That is a much harder problem to solve correctly. For example, if someone writes "10-12-2019", is that October 12 or December 10? If you know the user's locale, you can make a guess, and that's what ICU does. However, I wouldn't trust that result without having the user verify the output. This is why if the goal is user input, in general it is still safer to just use a good off-the-shelf date picker. |
I’m not sure if I’m on the same page with some other attendants of this discussion. I propose a strict only parser anyways. DateTimeFormat defines a strict format and I propose for it to have a parse function that would also be strict. To reiterate. parse is mathematically reverse of format when format options is lossless. It’s trivial to define isomorphic relation between Date and string when format options is lossy. So every adjective you use to characterize parse is also valid for format. Why are we talking about Date.parse or other non-strict parsers anyways? They have nothing to do with what I’m proposing. |
Could you say more about your use case where a strict parser is useful in applications? |
I agree with what @sffc mentions above: strict is easy to code, but it is hugely intolerant of any variations and it makes implementations sensitive to changes in CLDR data (what used to work, stops suddenly...) A strict parser is useful when you are both the generator and consumer of the resulting localized strings (in which case, ISO 8601 is right there and you ought to use it). @littledan when I have done lenient parsing, it was to parse custom date patterns, generally using Java-based ICU DateFormat. For example, a long time ago, when I worked at webMethods (so at least 15-16 years ago), we needed code to parse various flat file formats which were in some obscure industry standard. I've also seen people attempting to parse date strings that were machine generated (yet with localized tokens--mostly month name/abbreviation). This code was always super-fiddley because the parser yakked on even trivial things--I recall writing custom error handling over the lenient parser. @anilanar I realize that you are searching for round trip capability, but the fact is that you're always better off passing date values as values and only using display strings for display purposes. It's a poorly internationalized application that relies on being able to interpolate a display string back into a date. It would be nice to have a "mathematical reverse", but stuff like time zone IDs gets in the way. Ultimately, the question is: what application do you have for this, vs. mere "completeness" of the API? |
On Globalize, we have a parser whose job is to perform an inverse operation of the formatter (it's strict). Its application is to parse user entered input in controlled UI (generated by formatToParts), e.g., https://github.com/rxaviers/react-date-input |
If Ecma-402 doesn't provide a parser (at least a number parser), how would a user parse non "latin" numerals (e.g., eastern Arabic ٠١٢٣٤٥٦٧٨٩, full width digits 0123456789)? |
I am reopening for feedback about the above |
Why would we need to? If you need eastern arabic to western arabic numeral parser, then you should use a library for that, but I struggle to see it as a common use case. And if you need it, then likely you need different numeral systems as well. |
My other concern is that formatter can be lenient in the output and fallback on other numeral systems. But parser can't. You can't rely on any parser that may not have data for any numeral system. |
I could see us providing an API to expose character properties, exposing a subset of uchar.h. There are methods like |
At PayPal, there are cases where Japanese users enter numerals using fullwidth characters (which caused bug in some products). Product developers weren't even aware of the numeric regional differences. The goal (in such case) was simply to parse user entered numerals. Let me repeat what you're suggesting to make sure I understood it right. Product developers should handle the numerical mapping themselves (using a specific library for that). I can picture if-elses in that code do handle user entered numerals. That should be preferred instead of simply relying on Is that right? |
My impression was that a parser method would just expose whatever is already present in the engine to handle localization aspects of https://www.w3.org/TR/html/sec-forms.html#number-state-typenumber |
Number parsing (and date parsing) requires heuristics. UTS 35 does not have a well-defined algorithm for parsing numbers. Given that situation, it seems safer to put number parsing heuristics in user land. The alternative would be to essentially rely on "if-elses" in the ICU library, which is undesirable because (1) it is not well-specified and (2) the heuristics can change from release to release. |
Unicode properties (e.g., whether a character is a digit) are being discussed in #90. This should expose data about Arabic numerals so that a parser can be written in user land. Closing the issue again because it was re-opened a few posts earlier with a question specifically about Arabic numerals. |
I have some counterpoints and questions. Our use case: We use native JS Intl functionality to display dates with the user's configured langage/region. That works well. We'd be completely willing to require strict-ish string formats on text inputs for dates, but we'd still need to handle cases where 1 March 2021 is represented differently, e.g. France-French "01/03/2021" vs. US English "3/1/2021". Counterpoints (reasons we think a parser would be invaluable):
Handling of messy user input: Questions: Forgive me if I'm ranting, I'd really like to hear others' thoughts on this, and any workarounds other folks have come up with. |
That seems like an a-priori preference that is not justified by anything in your comment. "We'd prefer not to include a userland library, so extend the standard" is not a strong position to take.
That's solvable with time and issues filed against browsers. Extending a spec in such a massive way would take years and I don't think you'd find browser versions that support the new functionality but not input type="date".
Because parsers are very very hard and very flaky and add disproportionally high maintenance, compatibility and security overhead for maintainers. In most cases that results with a small number of people happy about the solution, long tail of people unhappy about their case not being supported, and accruing bugs and problems that are perceived as a lowered quality standard library.
Your comment is well phrased and expresses a genuine intent for your needs, I don't think there's anything wrong with that, but I appreciate your explicit description of intent and care about not come out as righteous :) Finally, since data changes, if we support any internationalized date parsing, we will, by necessity, create a situation where your input to your website will work one day, but in the future the same input to the same website will break on the future update of the same browser because data has been updated and the parsing patterns changed. I hope my message conveys the "this is orders of magnitude more complicated than it looks on the surface" and would likely sink more resources and braintime from our group than everything else we work on combined, while still producing something that wouldn't satisfy majority of people that would like to see such API. I believe user land library is a great solution. And if one becomes dominant and gets years of in-field experience, we could revisit this topic. But I think there's a reason that didn't happen yet. |
@zbraniecki Thank you for your response!
That's a really good point, and partially changes my view. |
+1 on everything @zbraniecki said. Also, see my blog post on the subject: https://blog.sffc.xyz/post/190943794505/why-you-should-not-parse-localized-strings |
I'd really like this feature. We have input for date that is used internationally and we are currently relying on 3rd party library. I would love to see it from the Intl. |
@tounsoo please read the backlog. Parsing is out-of-scope for ECMA 402 and won't be included. A 3rd party library is indeed the right way to go. |
tc39gh-424 is more comprehensive than tc39gh-342, which is specific to DateTimeFormat
An instance of
DateTimeFormat
can doDate -> string
but cannot do the reversestring -> Date
for formatted strings it created.It's very hard to implement that in user-land because different browsers might handle different languages in different ways?
My proposal is a
parse
method that is reverse of theformat
method:The text was updated successfully, but these errors were encountered: