Revamped parsing/formatting #236

jhpratt · 2020-03-17T20:27:42Z

While not ideal, I think it would be best to revamp the parsing and formatting of the various structs in the time crate.

According to tokei, the src/format directory has ~1100 lines of code. There is also a small amount of code in the files for each struct. While some of this code will be able to be reused, most of it will likely be replaced.

Major changes that I think would be sensible to make are:

Eliminate single-letter specifiers

It's just plain confusing. Some specifiers (like %Y) are easily remembered, but most are not. Can you tell me the difference between %w and %W without looking at the reference? I certainly can't.
More modifiers, even if specifier-specific

Let's allow for tons of options! Allow colons to be present (or not) for a UTC offset! There are certainly other things that could be allowed in the future.
~~Ability to lazily format~~

~~This is by far the easiest one, as it's just an API addition. It would probably be best to just return impl Display, so as to avoid any doc-hidden structs or other API guarantees.~~

The combination of the first two leads to an inherent problem: when does the parser (of formatting strings) know when the specifier is over? Luckily, there's a solution that is both simple and keeps the parser simple: use a bracketed/parenthesized grouping delimiter. Due to the necessarily longer names of specifiers, the modifiers can be separated (both visually and logically) by a single space.

Another change that could prove useful for performance is a public API for the various specifier-modifier combinations. A macro could then be provided that would parse the formatting string at compile-time, such that the formatting string parser could be dropped by rustc as dead code.

This is certainly quite a bit to put out all at once. If you have any thoughts (for or against), leave a comment!

Edit: Some notes for myself as to intent.

It would be nice to be able to assume some default value for a component, such that it need not necessarily be present in the string being parsed. It'll probably be necessary to expose the raw values that were parsed, which would also allow a third-party to use those values freely.

The text was updated successfully, but these errors were encountered:

mehcode · 2020-05-16T01:56:40Z

I'm a bit mixed on this. I'm going to write out what I think you're describing against some data I have. Please correct me if I'm wrong in understanding your idea here.

2020-05-12T23_40_45.010569Z

OffsetDateTime::parse(
    "{year}-{month}-{day}T{hour}_{minute}_{second}.{nanoseconds}")?;

Have you thought of "simply" using the ~~ISO~~ Unicode standard here for this?

https://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table

OffsetDateTime::parse(
    "yyyy-MM-dd'T'HH_mm_ss.SSSSSSSSS")?;

Tracking back to my issue, ~~ISO~~ Unicode specifies that the S.. sequence will match "at most" the number of characters you provide.

Here is some more documentation on the ~~ISO~~ Unicode format: https://date-fns.org/v2.13.0/docs/parse#description

jhpratt · 2020-05-16T02:05:52Z

There's already support for RFC3339, but that's not the format you're seeking. The variable number of decimal points was special-cased for that.

Unicode isn't ISO, but I think it's far clearer to specify things using words rather than using a seemingly arbitrary letter in many cases.

jhpratt · 2020-05-24T20:45:55Z

I've just added this issue to TWIR's call for participation. As such, here's an update on the basics:

Lazily formatting is done, and was trivial to complete. The rewrite won't impact the ability to do this.

Having specifier-specific modifiers is still planned, though a list of what's desired would be great. Right now, all I've got for certain is the various padding options (none, space, zero) for a number of specifiers as well as whether or not the UTC offset (currently %z) should contain a colon or not.

I began rewriting the parser for formatting strings yesterday. It uses a superior design, as it is a thin wrapper around a &str that implements Iterator, yielding an Item that is either a Literal that should be output as-is or a Specifier that should be processed. Eventually, Item will be able to be combined with various components (Date, Time, etc.), resulting in a struct that will implement Display. The design based around an iterator avoids allocating a Vec of formatting items, instead parsing the string as necessary. This should yield a noticeable performance improvement.

With regard to the format itself, I'm moving forward with bracketed specifiers. To output a literal [, it must be doubled (as is done with { in format!).

Ignoring the necessary modifiers for padding, the equivalent of %Y-%m-%dT%H:%M%S will be [year]-[month]-[day]T[hour]:[minute]:[second], or something very similar.

~~NB: I am considering changing the internal design to use &[u8] instead of &str. This would allow slicing and indexing to be simpler and avoid the compiler inserting possible panics.~~ Indexing and slicing is done on a byte level. The compiler still inserts possible panics, though they'll never be reached (see rust-lang/rust#72558).

jhpratt · 2020-06-09T20:46:36Z

While the compiler allows it, lazy formatting turns out to be unidiomatic, as it requires fallibility originating from the time crate.

Additionally, the return value of this function is fmt::Result which is a type alias of Result<(), std::fmt::Error>. Formatting implementations should ensure that they propagate errors from the Formatter (e.g., when calling write!). However, they should never return errors spuriously. That is, a formatting implementation must and may only return an error if the passed-in Formatter returns an error. This is because, contrary to what the function signature might suggest, string formatting is an infallible operation. This function only returns a result because writing to the underlying stream might fail and it must provide a way to propagate the fact that an error has occurred back up the stack.

As such, lazy formatting will be removed in 0.3. I'm investigating ways to handle formatting at all; right now I'm leaning towards an implementation of .format() that just returns a Result.

Note that it may still be possible to have infallible formatting (and as such would implement Display). This might be tricky, as it involves what is essentially typestate.

jhpratt · 2020-11-17T01:52:36Z

Revamped formatting has been fully implemented. There is a format method similar to the one that already exists and a format_into method that accepts a &dyn Write. Both can be used in #![no_std] environments; the former requires an allocator.

The format description can be constructed manually or it can be parsed from a textual representation of it; the latter requires an allocator.

jhpratt · 2021-04-22T07:30:03Z

The full syntax is documented here. Both formatting and parsing are fully implemented and tested.

jhpratt added the C-feature-request Category: a new feature (not already implemented) label Mar 17, 2020

jhpratt mentioned this issue Apr 5, 2020

Add ISO8601 +-HH:MM to formatting #241

Closed

jhpratt mentioned this issue Apr 20, 2020

Time v0.3 tracking issue #248

Closed

jhpratt self-assigned this Apr 20, 2020

jhpratt mentioned this issue May 16, 2020

[Feature] Parse fractional nanoseconds of various widths #261

Closed

jhpratt added the C-seeking-input 📣 Category: community input is desired label May 25, 2020

jhpratt added this to the v0.3 milestone Jan 19, 2021

jhpratt removed C-cleanup Category: cleanup of existing code C-seeking-input 📣 Category: community input is desired labels Jan 23, 2021

jhpratt closed this as completed Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamped parsing/formatting #236

Revamped parsing/formatting #236

jhpratt commented Mar 17, 2020 •

edited

Loading

mehcode commented May 16, 2020 •

edited

Loading

jhpratt commented May 16, 2020

jhpratt commented May 24, 2020 •

edited

Loading

jhpratt commented Jun 9, 2020

jhpratt commented Nov 17, 2020

jhpratt commented Apr 22, 2021

Revamped parsing/formatting #236

Revamped parsing/formatting #236

Comments

jhpratt commented Mar 17, 2020 • edited Loading

mehcode commented May 16, 2020 • edited Loading

jhpratt commented May 16, 2020

jhpratt commented May 24, 2020 • edited Loading

jhpratt commented Jun 9, 2020

jhpratt commented Nov 17, 2020

jhpratt commented Apr 22, 2021

jhpratt commented Mar 17, 2020 •

edited

Loading

mehcode commented May 16, 2020 •

edited

Loading

jhpratt commented May 24, 2020 •

edited

Loading