[BUG] Incorrect behavior when parsing CSV timestamps with out-of-range year with Spark 3.2.0 #4943

andygrove · 2022-03-11T23:02:18Z

Describe the bug
PR #4938 improves timestamp support when reading CSV sources but does not match Spark's behavior since 3.2.0 where there is validation on the date components. For example, Spark throws errors such as ValueError: year 32766 is out of range.

Steps/Code to reproduce bug
Update csv_test.py to remove the XFAIL from simple_int_values that references this issue.

Expected behavior
We should add validation logic to match Spark's behavior.

Additional context
None

The text was updated successfully, but these errors were encountered:

andygrove · 2022-03-14T19:11:33Z

This error was actually due to Python's limitation of max year 9999

andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 11, 2022

andygrove mentioned this issue Mar 11, 2022

Improve compatibility when reading timestamps from JSON and CSV sources #4938

Merged

5 tasks

andygrove closed this as completed Mar 14, 2022

sameerz removed the ? - Needs Triage Need team to review and classify label Mar 15, 2022

sameerz added the invalid This doesn't seem right label Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Incorrect behavior when parsing CSV timestamps with out-of-range year with Spark 3.2.0 #4943

[BUG] Incorrect behavior when parsing CSV timestamps with out-of-range year with Spark 3.2.0 #4943

andygrove commented Mar 11, 2022

andygrove commented Mar 14, 2022

[BUG] Incorrect behavior when parsing CSV timestamps with out-of-range year with Spark 3.2.0 #4943

[BUG] Incorrect behavior when parsing CSV timestamps with out-of-range year with Spark 3.2.0 #4943

Comments

andygrove commented Mar 11, 2022

andygrove commented Mar 14, 2022