Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom format strings when parsing date strings #796

Closed
c-eliasson opened this issue Jul 28, 2020 · 11 comments
Closed

Support custom format strings when parsing date strings #796

c-eliasson opened this issue Jul 28, 2020 · 11 comments
Labels
feedback v2? Candidate for being moved to a Temporal v2 proposal

Comments

@c-eliasson
Copy link

Apologies if this has already been up for discussion, but I couldn't find any issue on it, and nothing in the documentation that indicate that this is already supported.

When working with legacy or third-party systems, you often find yourself receiving date strings in the most unwieldy formats. To remedy this, date libraries often support custom format strings to make it possible to parse such strings.

Moment.js support passing a format string as a second argument, as defined in the docs.
Something like: moment("07-28-2020", "MM-DD-YYYY")

Luxon also support this through its .fromFormat() method.
Like this: DateTime.fromFormat("07-28-2020", "MM-DD-YYYY")

This is a feature I've used frequently over the years, and one of the reasons I've used third-party libraries instead of the Date object.

To add support for this in Temporal, one approach could be to extend the options object of the .from() method to also take an optional format string.

const dateTime = Temporal.DateTime.from('07-28-2020', { format: 'MM-DD-YYYY' })

Another approach could be to add an additional static method that takes the format as a second argument (like Luxon):

const dateTime = Temporal.DateTime.fromFormat('07-28-2020', 'MM-DD-YYYY')

I believe this would be convenient for at least Temporal.Absolute and Temporal.DateTime, but possibly also for all the other human-readable classes like Temporal.Date, Temporal.Time etc.

@ptomato
Copy link
Collaborator

ptomato commented Jul 30, 2020

Thanks for the feedback!

So far, we have mostly considered parsing (other than deserialization of ISO 8601-like strings) to be in the domain of business logic and outside the scope of Temporal (since everybody knows their own use case for parsing other formats better than Temporal possibly can, and to avoid repeating the mistakes of Date.parse().) I'll add this to our feedback list for further consideration though.

That said, if we did add it I would recommend a separate method than from(), to avoid confusion (consider Temporal.Date.from('2020-08-01') vs. Temporal.Date.from('2020-08-01', { format: 'YYYY-DD-MM' }).

@Ayfri
Copy link

Ayfri commented Jul 30, 2020

I think that this :

Temporal.Date.fromFormat(dateString: string, format: string);

Would be a good solution.

@c-eliasson
Copy link
Author

Thank you for taking this into consideration!

I totally agree with you @ptomato, Temporal should not be aware of all possible date string formats, as you say - that is specific to each individual use case. However, a standardised set of tokens that can be used to construct a format string would be general, and could therefor be part of the standard. The combination of tokens would be use case specific, but Temporal would still be format agnostic.

I also agree that the mistakes of Date.parse() should be avoided. Using an explicit format string would solve many of those issues though. With an explicit format string, you can avoid the disambiguation and uncertainty of how Date.parse() would parse your string, on any given host system.

You could have a strict parser that throws if the format string doesn't contain the necessary tokens required to create an instance of a given class.

// Throws because the format string doesn't contain any token to define 
// the 'day' part necessary to construct a Temporal.Date
const date = Temporal.Date.fromFormat('2020-08-01', 'YYYY-MM')

The same goes for when the date string matches the format string, but the parsed values are invalid.

// Throws because 45 is an invalid month
const date = Temporal.Date.fromFormat('2020-45-01', 'YYYY-MM-DD')

Possibly, you would want to support an optional third options argument to configure disambiguation, to be consistent with how Temporal.Date.from() works.

// Creates a Temporal.Date instance with 2020-12-01
const date = Temporal.Date.fromFormat('2020-45-01', 'YYYY-MM-DD', { disambiguation: 'constrain' })

As a side-note, in moment.js the parser is loose by default. According to the docs:

The parser ignores non-alphanumeric characters by default, so both of the following will return the same thing.

moment("12-25-1995", "MM-DD-YYYY");
moment("12/25/1995", "MM-DD-YYYY");

You would manually have to set it to strict mode, with a third argument, in order for that to throw.

In my opinion, that is a mistake though. In Temporal the parser should always be strict, to avoid disambiguation. If the date string isn't an exact match with the format string, an error should be thrown.

I also agree with you that it would probably be a better idea to implement this as a separate .fromFormat() method, to keep it separate from the behaviour of .from().

@ptomato
Copy link
Collaborator

ptomato commented Jul 30, 2020

Note to myself that this would also need first-class support for non-Gregorian calendars - e.g. '5780-11-09', 'YYYY-MM-DD' is valid in the Hebrew calendar.

@henryw374
Copy link

FYI java's DateTimeFormatter https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
is an entity you construct with your pattern ('YYYY-MM-DD' etc), and use for both parsing and printing/formatting.

The constructor-from-string fns on the individual entities, Date, Time etc use this under the hood.

DateTimeFormatter can be customised with locales, calendars etc.

@ianjmacintosh
Copy link

I also am looking for a way to get an arbitrarily-formatted string based on common patterns, like D MMM[, ]YYYY to yield "3 Sep, 2020" and wish I didn't have to import Moment (or any other libraries) to do it. Or have to make an array spelling out all the three-letter months of the year.

@justingrant
Copy link
Collaborator

re: formatting, here's an excerpt from date-fns 2.x documentation (https://date-fns.org/v2.16.1/docs/format) :

Return the formatted date string in the given format. The result may vary by locale.
⚠️ Please note that the format tokens differ from Moment.js and other libraries. See: https://git.io/fxCyr

The characters wrapped between two single quotes characters (') are escaped. Two single quotes in a row, whether inside or outside a quoted sequence, represent a 'real' single quote. (see the last example)
Format of the string is based on Unicode Technical Standard #35: https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table with a few additions (see note 7 below the table).

If we do use a microformat formatter, following the standard above seems like a reasonable starting point

@ianjmacintosh
Copy link

If we do use a microformat formatter, following the standard above seems like a reasonable starting point

More than just a reasonable starting point, this is the closest thing I've seen to an authoritative canonical spec. Cool find.

@ss485e
Copy link

ss485e commented Oct 2, 2020

I too would vote for parsing and formatting with a format string. I think it will have two advantages.

  1. Interoperate well with other programming language's date/time strings.
  2. I believe parsing will be faster rather than checking plethora of verbose string formats (which ISO formats can be) a format string can be straightforward to validate and parse. It might be possible to cache the parser generated from the format string.

I am writing this on a day when we had major outage because of date formatting issue. To create a local date format of yyyy+(m+1)-dd. The month being 0 indexed we had a small bug of matching with '<10' for padding and it was interpreted as 9, and got padded '2020-010-01', bam! timebomb went off and multiple services in multiple environments started going down all at the same time. Only if toLocalJSON() or something like that would have been present I am sure developer would have used a substring to get the formatted date portion.
In other times I miss this local date formatter with custom string is while writing log line timestamps, or chronologically naming files. Local times are much easier on eyes, also bonus we can create these kind of strings 20201001.

@henryw374
Copy link

I think that this :

Temporal.Date.fromFormat(dateString: string, format: string);

Would be a good solution.

Just to say, parsing/formatting needs to be locale-aware - and locale-aware bundles for Moment, js-joda etc are big, ie about 45KB for a bundle just containing en-US alone (e.g. https://www.npmjs.com/package/@js-joda/locale). So not having to send down these huge payloads to client browsers if a parser+formatter gets included in Temporal would be a huge win IMO

@ptomato ptomato added the v2? Candidate for being moved to a Temporal v2 proposal label Jan 14, 2021
@ptomato
Copy link
Collaborator

ptomato commented Feb 11, 2021

This is not included in the Temporal proposal that's currently under review, but we can move the discussion to Temporal V2: js-temporal/proposal-temporal-v2#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback v2? Candidate for being moved to a Temporal v2 proposal
Projects
None yet
Development

No branches or pull requests

7 participants