Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying YYYY instead of yyyy in date format yields to parsing wrong year #7461

Closed
radeusgd opened this issue Aug 1, 2023 · 11 comments · Fixed by #7826
Closed

Specifying YYYY instead of yyyy in date format yields to parsing wrong year #7461

radeusgd opened this issue Aug 1, 2023 · 11 comments · Fixed by #7826
Assignees
Labels
--data corruption or loss Important: data corruption or loss -libs Libraries: New libraries to be implemented l-datetime Libraries: date/time utilities p-high Should be completed in the next sprint
Milestone

Comments

@radeusgd
Copy link
Member

radeusgd commented Aug 1, 2023

Somehow if I specify YYYY in the date format, when parsing a date like 03/01/2020 the date gets advanced into current year 2023.

image

from Standard.Base import all
from Standard.Table import all

main =
    d = "03/01/2020"
    t = Table.new [["X", [d]]]
    t2 = t.parse "X" Value_Type.Date "dd/MM/YYYY"
    t2.print

    t3 = t.parse "X" Value_Type.Date "dd/MM/yyyy"
    t3.print

    IO.println <| Date.parse d "dd/MM/YYYY"
    IO.println <| Date.parse d "dd/MM/yyyy"

Actual behaviour

   | X
---+------------
 0 | 2023-01-03

   | X
---+------------
 0 | 2020-01-03

2023-01-03
2020-01-03

Expected behaviour

I know that the Y has a slightly different meaning in the Date format - meaning a week-year. But that still does not seem to make sense with the observed behaviour - changing the year as much as 3 years.

Regardless, it is extremely easy to mistakenly use Y instead of y in the format (especially when we don't have dropdowns for Table.parse yet). I think we need to make sure such mistakes are less likely to happen, as they are not that easy to detect and they lead to serious corruption of data that is not always very easy to notice.


After observing that Java actually fails to parse this kind of date with that format string, I'm almost sure that Enso should error as well.

@github-project-automation github-project-automation bot moved this to ❓New in Issues Board Aug 1, 2023
@radeusgd radeusgd added -libs Libraries: New libraries to be implemented l-datetime Libraries: date/time utilities --data corruption or loss Important: data corruption or loss labels Aug 1, 2023
@radeusgd
Copy link
Member Author

radeusgd commented Aug 1, 2023

Maybe we should attach a warning when Y is used in the format, notifying that it is a more advanced feature?

Still, the current behaviour is plain incorrect. I assume we fall into the fallback that allows us to parse month+day combinations lacking the year - that's why the current year is put in as a 'default'.

As we can see, in Java this format+date combination actually fails:

jshell> LocalDate.parse("03/01/2020", DateTimeFormatter.ofPattern("dd/MM/YYYY"))
|  Exception java.time.format.DateTimeParseException: Text '03/01/2020' could not be parsed: Unable to obtain LocalDate from TemporalAccessor: {DayOfMonth=3, WeekBasedYear[WeekFields[MONDAY,4]]=2020, MonthOfYear=1},ISO of type java.time.format.Parsed
|        at DateTimeFormatter.createError (DateTimeFormatter.java:2023)
|        at DateTimeFormatter.parse (DateTimeFormatter.java:1958)
|        at LocalDate.parse (LocalDate.java:430)
|        at (#3:1)
|  Caused by: java.time.DateTimeException: Unable to obtain LocalDate from TemporalAccessor: {DayOfMonth=3, WeekBasedYear[WeekFields[MONDAY,4]]=2020, MonthOfYear=1},ISO of type java.time.format.Parsed
|        at LocalDate.from (LocalDate.java:398)
|        at Parsed.query (Parsed.java:241)
|        at DateTimeFormatter.parse (DateTimeFormatter.java:1954)
|        ...

I think the same should happen for Enso. We need to process the formats in a smarter way and if the format contains more than just Month+Day parts, we cannot enter the fallback and replace the year with current one. I think we should also detect if there are any date parts that were specified in the format but not used in the final result (like YYYY here - the WeekBasedYear part was not used to produce the final result, this should be an error or at least a warning).

As a side note, given that it is so easy to confuse YYYY with yyyy I'm not sure if we should consider remapping YYYY to yyyy automatically. We could then provide some more advanced tools for constructing the format if someone actually needs the WeekBasedYear field to be parsed. Not sure if that's the best case as it's going further from the Java 'baseline' that is rather well known, but I think it may be worth it as otherwise this will confuse beginner users that do not know Java formats.

@jdunkerley
Copy link
Member

The plan is to convert Y to y and a few others as well.
The Java format strings are not user-friendly and do cause problems

@jdunkerley jdunkerley self-assigned this Aug 1, 2023
@jdunkerley
Copy link
Member

Initial notes but needs more thought and work.

Following conversions would be useful I think:

d/D - Day / Weekday name
M - Month or Month name
m - Minute
y/Y - Year
h - 12 hour
H - 24 hour
s/S - second

  • Y ==> y: Single digit year
  • YY ==> yy: Double digit year
  • YYYY ==> yyyy: 4 digit year
  • ddd ==> EEE: weekday short name
  • dddd ==> EEEE: weekday long name
  • MMM ==> LLL: month short name
  • MMMM ==> LLLL: month long name
  • D ==> map to lower case d.
  • S ==> subsecond???

HH:mm:ss.000000 (fractional sub seconds)
HH:mm:ss.###### (optional fractional sub seconds)

Need something for WeekYear and DayOfYear.

@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board Aug 15, 2023
@jdunkerley jdunkerley assigned radeusgd and unassigned jdunkerley Sep 5, 2023
@jdunkerley jdunkerley added the p-high Should be completed in the next sprint label Sep 5, 2023
@jdunkerley jdunkerley added this to the Beta Release milestone Sep 5, 2023
@jdunkerley jdunkerley moved this from 📤 Backlog to ❓New in Issues Board Sep 5, 2023
@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board Sep 5, 2023
@radeusgd radeusgd moved this from 📤 Backlog to 🔧 Implementation in Issues Board Sep 11, 2023
@enso-bot
Copy link

enso-bot bot commented Sep 14, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-09-13):

Progress: Clarifying spec and docs for date patterns. Implemented a tokenizer for date format, started work on parser. It should be finished by 2023-09-18.

Next Day: Next day I will be working on the same task. Imlpement the parser and generate DateTimeFormatter. Work on dropdowns.

@enso-bot
Copy link

enso-bot bot commented Sep 15, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-09-14):

Progress: Implemented new date format parsing. It should be finished by 2023-09-18.

Next Day: Next day I will be working on the same task. Integrate with existing usage sites. Work on dropdowns.

@enso-bot
Copy link

enso-bot bot commented Sep 18, 2023

Radosław Waśko reports a new STANDUP for the provided date (2023-09-15):

Progress: Integrated new format parsing with Date.parse/format. Preparing integration with other parts. Updated dropdowns - but failed to get them to display in the IDE - debugging. It should be finished by 2023-09-18.

Next Day: Next day I will be working on the same task. Fix dropdowns. Integrate remaining parts. Add tests.

@enso-bot
Copy link

enso-bot bot commented Sep 19, 2023

Radosław Waśko reports a new 🔴 DELAY for yesterday (2023-09-18):

Summary: There is 2 days delay in implementation of the Specifying YYYY instead of yyyy in date format yields to parsing wrong year (#7461) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: I did underestimate the amount of work. Debugging widgets was a bit slow since I did not have that much diagnsotic info so a lot of back and forth and iteration time is pretty big. But that's only part - other part is - simply a lot of stuff with this and I was slower than I originally anticipated.

@enso-bot
Copy link

enso-bot bot commented Sep 19, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-09-18):

Progress: Fixed the dropdowns. Checked how the new format plays in practice with drodpowns. Got +- OK on the design so will continue. It should be finished by 2023-09-20.

Next Day: Next day I will be working on the same task. Integrate with Data_Formatter and all other parts. Add tests.

@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Sep 20, 2023
@enso-bot
Copy link

enso-bot bot commented Sep 21, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-09-20):

Progress: Integerated Data_Formatter and Column.format. All tests are finally passing. PR is ready for review. It should be finished by 2023-09-20.

Next Day: Next day I will be working on the same task. Implement some further improvements.

@enso-bot
Copy link

enso-bot bot commented Sep 22, 2023

Radosław Waśko reports a new 🔴 DELAY for yesterday (2023-09-21):

Summary: There is 1 days delay in implementation of the Specifying YYYY instead of yyyy in date format yields to parsing wrong year (#7461) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: I forgot about Column.format in previous estimations

@enso-bot
Copy link

enso-bot bot commented Sep 22, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-09-21):

Progress: Applying review suggestions, getting some missing tests to pass. Testing new stuff (quarter parsing, week-date parsing). It should be finished by 2023-09-21.

Next Day: Next day I will be working on the #7872 task. Implement some further improvements.

@mergify mergify bot closed this as completed in #7826 Sep 22, 2023
mergify bot pushed a commit that referenced this issue Sep 22, 2023
- Closes #7461 by introducing a `Date_Time_Formatter` type and making parsing date time formats more robust and safer.
- The default ('simple') set of patterns is slightly simplified and made case insensitive (except for `M/m` and `H/h`) to avoid the `YYYY` vs `yyyy` issues and make it less error prone.
- The `YYYY` now has the same meaning as `yyyy` in simple mode. The old meaning (week-based year) is moved to a _separate mode_, triggered by `Date_Time_Formatter.from_iso_week_date_pattern`.
- Full Java syntax, as well as custom-built Java `DateTimeFormatter` can also be used by `Date_Time_Formatter.from_java`.
- Text-based constants (e.g. `ISO_ZONED_DATE_TIME`) have now become methods on `Date_Time_Formatter`, e.g. `Date_Time_Formatter.iso_zoned_date_time`).
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
--data corruption or loss Important: data corruption or loss -libs Libraries: New libraries to be implemented l-datetime Libraries: date/time utilities p-high Should be completed in the next sprint
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants