Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetadataValue schema doesn't support nested values, used by IcebergCompatV2 protocol #253

Closed
jeppe742 opened this issue Jun 10, 2024 · 4 comments · Fixed by #257
Closed

Comments

@jeppe742
Copy link

When you create a delta table with Uniform enabled it will create a delta transaction that looks something like this

{"commitInfo":{"timestamp":1717753754287,"operation":"CREATE TABLE","operationParameters":{"isManaged":"true","description":null,"partitionBy":"[]","properties":"{\"delta.enableIcebergCompatV2\":\"true\",\"delta.universalFormat.enabledFormats\":\"iceberg\",\"delta.columnMapping.mode\":\"name\",\"delta.columnMapping.maxColumnId\":\"1\"}"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{},"engineInfo":"Apache-Spark/3.5.1 Delta-Lake/3.1.0","txnId":"a4d4593f-835c-4d00-81d8-27c1103343d2"}}
{"metaData":{"id":"a8477f73-f004-4a08-8397-3420d4df98a2","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"c1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":1,\"delta.columnMapping.nested.ids\":{},\"delta.columnMapping.physicalName\":\"col-fdc375c2-e5f2-44c5-a5e9-2cdafca1ddfd\"}}]}","partitionColumns":[],"configuration":{"delta.enableIcebergCompatV2":"true","delta.universalFormat.enabledFormats":"iceberg","delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"1"},"createdTime":1717753754108}}
{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV2"]}}

Notice that the metaData.schemaString.metadata property has the following metadata

{
    "metadata": {
        "delta.columnMapping.id": 1,
        "delta.columnMapping.nested.ids": {},
        "delta.columnMapping.physicalName": "col-fdc375c2-e5f2-44c5-a5e9-2cdafca1ddfd"
    }
}

Currently the schema parser only expects a number, string or boolean, but not a nested struct like we have for "delta.columnMapping.nested.ids": {}

pub enum MetadataValue {
Number(i32),
String(String),
Boolean(bool),
}

This causes all delta tables written with Iceberg enabled through Uniform, to be unreadable with the kernel. (See delta-io/delta-rs#2578)

@nicklan
Copy link
Collaborator

nicklan commented Jun 11, 2024

Thanks for the report. #257 should fix this!

@jeppe742
Copy link
Author

Thanks @nicklan !
Just out of curiosity, do we have an idea when this will be included in a new release? 😃

@nicklan
Copy link
Collaborator

nicklan commented Jun 25, 2024

I need to verify that we haven't changed any APIs, but assuming we haven't, I'll get a 0.1.2 release out this week with this and a few other fixes.

@nicklan
Copy link
Collaborator

nicklan commented Jul 17, 2024

@jeppe742 sorry for the long delay! we did change APIs so I needed to do a 0.2.0 release, but it's now out with this included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants