Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect behavior when parsing CSV timestamps with out-of-range year with Spark 3.2.0 #4943

Closed
andygrove opened this issue Mar 11, 2022 · 1 comment
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@andygrove
Copy link
Contributor

Describe the bug
PR #4938 improves timestamp support when reading CSV sources but does not match Spark's behavior since 3.2.0 where there is validation on the date components. For example, Spark throws errors such as ValueError: year 32766 is out of range.

Steps/Code to reproduce bug
Update csv_test.py to remove the XFAIL from simple_int_values that references this issue.

Expected behavior
We should add validation logic to match Spark's behavior.

Additional context
None

@andygrove andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 11, 2022
@andygrove
Copy link
Contributor Author

This error was actually due to Python's limitation of max year 9999

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Mar 15, 2022
@sameerz sameerz added the invalid This doesn't seem right label Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants