-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lack of support for new TimestampNTZType in Spark 3.4 datatypes #1385
Fix lack of support for new TimestampNTZType in Spark 3.4 datatypes #1385
Conversation
Signed-off-by: Filipe Oliveira <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1385 +/- ##
=======================================
Coverage 93.92% 93.92%
=======================================
Files 91 91
Lines 6781 6787 +6
=======================================
+ Hits 6369 6375 +6
Misses 412 412
☔ View full report in Codecov by Sentry. |
…mestampntztype Signed-off-by: Filipe Oliveira <[email protected]>
Signed-off-by: Filipe Oliveira <[email protected]>
Thanks, and congrats on your first PR @filipeo2-mck ! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
Amazing stuff @filipeo2-mck. Thanks for the effort! |
…nionai-oss#1385) * add TimestampNTZType as equivalents and add parameters to test case Signed-off-by: Filipe Oliveira <[email protected]> * parse version to improve robustness Signed-off-by: Filipe Oliveira <[email protected]> --------- Signed-off-by: Filipe Oliveira <[email protected]> Signed-off-by: Nok <[email protected]>
When loading a parquet file with timestamp fields into a dataframe using Spark 3.4, pyspark can make use of the new
pyspark.sql.types.TimestampNTZType()
but these news fields were not declared as equivalent to Pandera's Timestamp.Stack Trace
Issue
The
pyspark_engine.py
module was not prepared to processpyspark.sql.types.TimestampNTZType()
as a Timestamp.Proposed solution
As this type is exclusive to Spark 3.4 and newer versions, an IF condition was needed to ensure that it's added only when pyspark >= 3.4 is being used, both in the main code as in tests:
Behavior when using PySpark 3.3, with common types only:
Behavior when using PySpark 3.4, with the new type:
Additional details
Version in use: 0.17.2
OS: MacOS 13.6 (M1)