-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: schema check of iceberg logical types #856
fix: schema check of iceberg logical types #856
Conversation
this is a trivial implementation , probably not enough. thanks all |
@kevinjqliu wdyt ? |
I'm +1 on this change in theory. I feel like I wonder if there's a more generalized solution for this instead of hardcoding UUID to FixedType(16) conversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @raphaelauv for jumping on this right away.
Could we make this part of the pyarrow_to_schema
method? I think we could lookup fields there to double-check if we need to apply any logical types.
This way we can also fix issues like #830.
thanks for the review @Fokko, to make it part of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @raphaelauv this looks like a great start! 🙌
Could you add two tests as well? One with a UUID as a top level field, and one where it is nested inside of a struct? Thanks!
@@ -153,7 +155,21 @@ | |||
ALWAYS_TRUE = AlwaysTrue() | |||
TABLE_ROOT_ID = -1 | |||
|
|||
_JAVA_LONG_MAX = 9223372036854775807 | |||
|
|||
def _apply_logical_conversion(table_schema: Schema, task_schema: Schema) -> Schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can defer this to a later PR, but just want to put it out here.
In PyIceberg we have a little bit an obsession with the visitor pattern. Using the SchemaWithPartnerVisitor
you can traverse two schema's at once. An example is the ArrowProjectionVisitor
and can be found in pyarrow.py
. The ArrowAccessor
does the lookups by ID, but I think here name makes more sense since we don't have the IDs (yet).
fix by #921 |
close: #855