Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY #9059

Closed
tgravescs opened this issue Aug 16, 2023 · 1 comment · Fixed by #9649
Closed

[FEA] Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY #9059

tgravescs opened this issue Aug 16, 2023 · 1 comment · Fixed by #9649
Assignees
Labels
feature request New feature or request

Comments

@tgravescs
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
User job failed with the exception:

 org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: 
reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from Parquet
files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of
Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic
Gregorian calendar. See more details in SPARK-31404. You can set the SQL config
'spark.sql.parquet.datetimeRebaseModeInRead' or the datasource option 'datetimeRebaseMode' to 'LEGACY' to rebase the datetime values
w.r.t. the calendar difference during reading. To read the datetime values as it is,
set the SQL config 'spark.sql.parquet.datetimeRebaseModeInRead' or the datasource option 'datetimeRebaseMode' to 'CORRECTED'.
   

In this case we actually had spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED, but my assumption is since this failed it was overwritten by the setting in the actual parquet file.

It would be nice if the Rapids Plugin could support the LEGACY mode as some CSPs expected that and have it defaults for reads and conversely the write side.

@viadea
Copy link
Collaborator

viadea commented Oct 25, 2023

As mentioned in #9540 , I hope the fix for this issue include below scenario:
Be able to read an old date/timestamp which was written using spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
4 participants