You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OPTIMIZE table_name REWRITE DATA USING BIN_PACK WHEREdate='date_as_string'
Optionally, VACUUM the table. It doesn't matter and doesn't change the behavior in any way.
Query the table using the same PyIceberg code as in step 3.
to_arrow raises an exception: ValueError: Iceberg schema is not embedded into the Parquet file, see https://github.com/apache/iceberg/issues/6505
The table can still be accessed correctly in AWS Athena.
Expected behavior
In step 7, the code should work correctly and return the same results as the code in step 3.
Thanks @mikulskibartosz for reporting this. Kudo's for the comprehensive issue. This is a known issue that we're working on and will be fixed in the next release: #6647
Apache Iceberg version
1.1.0
Query engine
Athena
Please describe the bug 🐞
It's not possible to read an Iceberg table with PyIceberg if the data was written using PySpark and compacted with AWS Athena.
Steps to reproduce
The
result
variable contains correct data.Optionally, VACUUM the table. It doesn't matter and doesn't change the behavior in any way.
Query the table using the same PyIceberg code as in step 3.
to_arrow
raises an exception:ValueError: Iceberg schema is not embedded into the Parquet file, see https://github.com/apache/iceberg/issues/6505
The table can still be accessed correctly in AWS Athena.
Expected behavior
In step 7, the code should work correctly and return the same results as the code in step 3.
Dependency versions
Writing data (step 2)
Reading data (steps 3 and 7):
The text was updated successfully, but these errors were encountered: