Forward incompatible types introduced when writing Iceberg data #887

sungwy · 2024-07-02T19:28:43Z

Apache Iceberg version

None

Please describe the bug 🐞

Through the introduction of #807 we have introduced large_* types in the parquet files, which cannot be read using an earlier version of PyIceberg:
TypeError: Unsupported type: large_string

Although the parquet types are the same, there must be an encoding detail that instructs pyarrow to read these as large_* types on read.

Therefore, instead of defaulting to large_* types, we should default the types to small types on write.

The text was updated successfully, but these errors were encountered:

kevinjqliu · 2024-07-02T21:33:46Z

So the current version of pyiceberg can write parquet files with the large_string data type. But the older version of pyiceberg cannot read the parquet file with the large_string data type.

I feel like this is a library versioning problem and its ok to not be backwards compatible, esp before the 1.0 version.

My opinion is that we should be able to support both string and large_string data types. And if supporting large_string type means the library won't be backwards compatible, that is ok.

Fokko · 2024-07-12T10:50:13Z

Closing this one since #902 has been merged. Thanks @syun64 for reporting this 🙌

sungwy added this to the PyIceberg 0.7.0 release milestone Jul 2, 2024

sungwy changed the title ~~Backward incompatible types introduced when writing Iceberg data~~ Forward incompatible types introduced when writing Iceberg data Jul 2, 2024

sungwy mentioned this issue Jul 3, 2024

Forward Compatible large_* type support: read as large, write as small #890

Closed

Fokko closed this as completed Jul 12, 2024

kevinjqliu mentioned this issue Sep 3, 2024

Regression in 0.7.0 due to type coercion from "string" to "large_string" #1128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward incompatible types introduced when writing Iceberg data #887

Forward incompatible types introduced when writing Iceberg data #887

sungwy commented Jul 2, 2024

kevinjqliu commented Jul 2, 2024

Fokko commented Jul 12, 2024

Forward incompatible types introduced when writing Iceberg data #887

Forward incompatible types introduced when writing Iceberg data #887

Comments

sungwy commented Jul 2, 2024

Apache Iceberg version

Please describe the bug 🐞

kevinjqliu commented Jul 2, 2024

Fokko commented Jul 12, 2024