fix: schema check of iceberg logical types #856

raphaelauv · 2024-06-25T15:17:02Z

close: #855

raphaelauv · 2024-06-25T15:18:31Z

this is a trivial implementation , probably not enough. thanks all

raphaelauv · 2024-06-26T16:09:13Z

kevinjqliu · 2024-06-26T16:32:09Z

I'm +1 on this change in theory. I feel like _check_schema_compatible should be as non-blocking as possible, i.e. if pyarrow can write the dataset, _check_schema_compatible should allow it.

I wonder if there's a more generalized solution for this instead of hardcoding UUID to FixedType(16) conversion.
@syun64 @HonahX @Fokko wdyt

Fokko

Thanks @raphaelauv for jumping on this right away.

Could we make this part of the pyarrow_to_schema method? I think we could lookup fields there to double-check if we need to apply any logical types.

This way we can also fix issues like #830.

pyiceberg/table/__init__.py

raphaelauv · 2024-06-26T17:30:25Z

thanks for the review @Fokko, to make it part of pyarrow_to_schema we must change a lot of things to propagate the table_schema ( that is an iceberg schema ) , that's what I tried first and then I reverted and made a separate function _apply_logical_conversion

Fokko

Hey @raphaelauv this looks like a great start! 🙌

Could you add two tests as well? One with a UUID as a top level field, and one where it is nested inside of a struct? Thanks!

Fokko · 2024-06-27T08:21:06Z

pyiceberg/table/__init__.py

@@ -153,7 +155,21 @@
 ALWAYS_TRUE = AlwaysTrue()
 TABLE_ROOT_ID = -1

-_JAVA_LONG_MAX = 9223372036854775807
+
+def _apply_logical_conversion(table_schema: Schema, task_schema: Schema) -> Schema:


We can defer this to a later PR, but just want to put it out here.

In PyIceberg we have a little bit an obsession with the visitor pattern. Using the SchemaWithPartnerVisitor you can traverse two schema's at once. An example is the ArrowProjectionVisitor and can be found in pyarrow.py. The ArrowAccessor does the lookups by ID, but I think here name makes more sense since we don't have the IDs (yet).

raphaelauv · 2024-07-16T13:32:45Z

fix by #921

fix: schema check of iceberg logical types

751d87b

Fokko reviewed Jun 26, 2024

View reviewed changes

pyiceberg/table/__init__.py Outdated Show resolved Hide resolved

review 1

d1c48bf

Fokko reviewed Jun 27, 2024

View reviewed changes

raphaelauv closed this Jul 16, 2024

raphaelauv deleted the fix/check_logical_types_arrow_iceberg branch July 16, 2024 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: schema check of iceberg logical types #856

fix: schema check of iceberg logical types #856

raphaelauv commented Jun 25, 2024

raphaelauv commented Jun 25, 2024

raphaelauv commented Jun 26, 2024

kevinjqliu commented Jun 26, 2024

Fokko left a comment

raphaelauv commented Jun 26, 2024

Fokko left a comment

Fokko Jun 27, 2024

raphaelauv commented Jul 16, 2024

fix: schema check of iceberg logical types #856

fix: schema check of iceberg logical types #856

Conversation

raphaelauv commented Jun 25, 2024

raphaelauv commented Jun 25, 2024

raphaelauv commented Jun 26, 2024

kevinjqliu commented Jun 26, 2024

Fokko left a comment

Choose a reason for hiding this comment

raphaelauv commented Jun 26, 2024

Fokko left a comment

Choose a reason for hiding this comment

Fokko Jun 27, 2024

Choose a reason for hiding this comment

raphaelauv commented Jul 16, 2024