You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Through the introduction of #807 we have introduced large_* types in the parquet files, which cannot be read using an earlier version of PyIceberg: TypeError: Unsupported type: large_string
Although the parquet types are the same, there must be an encoding detail that instructs pyarrow to read these as large_* types on read.
Therefore, instead of defaulting to large_* types, we should default the types to small types on write.
The text was updated successfully, but these errors were encountered:
So the current version of pyiceberg can write parquet files with the large_string data type. But the older version of pyiceberg cannot read the parquet file with the large_string data type.
I feel like this is a library versioning problem and its ok to not be backwards compatible, esp before the 1.0 version.
My opinion is that we should be able to support both string and large_string data types. And if supporting large_string type means the library won't be backwards compatible, that is ok.
sungwy
changed the title
Backward incompatible types introduced when writing Iceberg data
Forward incompatible types introduced when writing Iceberg data
Jul 2, 2024
Apache Iceberg version
None
Please describe the bug 🐞
Through the introduction of #807 we have introduced large_* types in the parquet files, which cannot be read using an earlier version of PyIceberg:
TypeError: Unsupported type: large_string
Although the parquet types are the same, there must be an encoding detail that instructs pyarrow to read these as large_* types on read.
Therefore, instead of defaulting to large_* types, we should default the types to small types on write.
The text was updated successfully, but these errors were encountered: