Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ParseErrorKind::TooMany if format string contains too many directives for given type #1079

Closed
wants to merge 1 commit into from

Conversation

MarcoGorelli
Copy link
Contributor

@MarcoGorelli MarcoGorelli commented May 22, 2023

Thanks for contributing to chrono!

If your feature is semver-compatible, please target the 0.4.x branch;
the main branch will be used for 0.5.0 development going forward.

Please consider adding a test to ensure your bug fix/feature will not break in the future.


Here's my attempt at adding ParseErrorKind::TooMany

closes #1075

@MarcoGorelli MarcoGorelli marked this pull request as ready for review May 22, 2023 12:40
@pitdicker
Copy link
Collaborator

I have not looked at your code yet, apologies. But you are modifying parts of the public API. Is there a way to do this that is backwards compatible? Or do you want to target 0.5?

I am slowly getting more familiar with the ideas behind the current parser myself... At a high level we have:

  • StrftimeItems lazily parses a format string into formatting items.
  • format::parse consumes the iterator and calls the right parsing method for each item.
  • The various date/time fields are collected into Parsed. This is the point where an error can be raised for duplicate fields.
  • Parsed can be used to create multiple types, like NaiveDate or DateTime.

The right place to raise an error for what you want is when converting Parsed to an output type. Just check the fields and see if any unused ones are not None.

@pitdicker
Copy link
Collaborator

Not to discourage, but I don't know if an error is what I would like for your case in #1075.

use chrono;
fn main() {
    println!("{:?}", chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H"));
}

Can it not be a valid use case to be only interested in the date part? While you have to provide a format string that can parse all fields of input, you may not be interested in all of them.

@pitdicker
Copy link
Collaborator

I see your linked issue pola-rs/polars#8849. Would it help if chrono was able to respect the padding modifiers when parsing (they are currently ignored)?

@MarcoGorelli
Copy link
Contributor Author

Can it not be a valid use case to be only interested in the date part?

Sure, but then couldn't you just as easily call .date?

use chrono::{NaiveDateTime};

fn main() {
    let naive_datetime = NaiveDateTime::parse_from_str("2023-05-23 10:30:00", "%Y-%m-%d %H:%M:%S").unwrap();
    let naive_date = naive_datetime.date();
    println!("{:?}", naive_date);
}

@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented May 23, 2023

Would it help if chrono was able to respect the padding modifiers when parsing (they are currently ignored)?

Thanks for taking a look at the linked issue - I don't think that respecting the padding modifiers would solve that case, because then NaiveDateTime::parse_from_str("2023-05-23 10", "%Y-%m-%d %H") would still fail but NaiveDate::parse_from_str("2023-05-23 10", "%Y-%m-%d %H") would pass, resulting in the hour component being silently dropped from the user's input

@MarcoGorelli
Copy link
Contributor Author

Sure, but then couldn't you just as easily call .date?

Sorry this wouldn't work if NaiveDateTime doesn't parse with just %H and no %M present

Might just have to work around this in polars then if this is the desired behaviour - saves breaking backwards compatibility in chrono too.

Thanks for your responses, closing then!

@pitdicker
Copy link
Collaborator

You also seem to be at a pc at the moment, so I'll take advantage of that to ask something more 😄.

I have seen another issue that it would be nice if we had some way to validate a format string. And in a personal project I wanted a way to query if a format string has enough fields to describe a date, a time, datetime, etc.

Would such a validating function be helpful here?

(And don't mind my opinions too much)

@MarcoGorelli
Copy link
Contributor Author

if a format string doesn't have enough fields to describe a date/time/datetime, then it already errors - what we'd really need is a way to validate if a format string has more fields than will end up in the result

@pitdicker
Copy link
Collaborator

That seems like useful information that should somehow be possible to return.

@djc
Copy link
Member

djc commented May 23, 2023

I do think we should probably error out if the format string is parsing data that it has no way of returning.

@MarcoGorelli
Copy link
Contributor Author

thanks @djc - do we want to continue the conversation in the issue, and see where to take it from there?

@djc
Copy link
Member

djc commented May 23, 2023

Yeah, can you make a proposal there on how we could accomodate this use case?

@pitdicker
Copy link
Collaborator

The example in the linked issue is interesting. What if the input only has a date and hour? We are currently not able to parse that into a NaiveDateTime:

use chrono::{NaiveDateTime};

fn main() {
    let naive_datetime = NaiveDateTime::parse_from_str("2023-05-23 10", "%Y-%m-%d %H").unwrap();
    let naive_date = naive_datetime.date();
    println!("{:?} {:?}", naive_datetime, naive_date);
}

Fails with:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseError(NotEnough)', src/main.rs:20:88
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It seems like parsing where the fields do not fully match to output type has three choices:

  1. Allow but ignore useless fields, don't substitute default values for missing fields.
  2. Error on useless fields, and substitute default values for missing fields.
  3. The fields must exactly match the output type.

Chrono currently works like 1. Option 2 seems more useful to me.

@MarcoGorelli
Copy link
Contributor Author

Option 2 seems more useful to me.

same, to me too

@djc
Copy link
Member

djc commented May 23, 2023

Agreed in the sense that I think date + hour should be accepted, but I think we probably shouldn't parse %Y-%m-%d into NaiveDateTime (the caller should use NaiveDate instead) and be really careful about where we substitute default values (although TBH I don't know the exact rules we have now).

@jtmoon79
Copy link
Contributor

jtmoon79 commented May 24, 2023

  1. Error on useless fields, and substitute default values for missing fields.
  2. The fields must exactly match the output type.

For Issue #660, I changed behavior of function chrono::Local.datetime_from_str to be very strict about whitespace. So the behavior in unreleased branch main is to be very strict about all characters in the two input strings (see PR #807). I think this "sense of strictness" should be consistent behavior. So I lean toward 3.

However, 2. is useful. If 2. is accepted then it must be clearly documented everywhere it happens (which I think includes function datetime_from_str), e.g.

/// Function `parse_from_str` will substitute default values for missing date parts.
/// A missing second specifier will substitute a `second` value of `0`.
/// A missing minute specifier will substitute a `minute` value of `0`.
/// A missing hour specifier will substitute a `hour` value of `0`.
/// A missing day specifier will substitute a `day` value of `1`.
/// A missing month specifier value will substitute a `month` value of `1`.
/// A missing year specifier value will substitute the current year.

(the substitute behavior for missing day, month, and year could be debated further)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants