-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some INSERT statements are ignored with Iceberg #20092
Comments
Slack discussion: https://trinodb.slack.com/archives/CJ6UC075E/p1702376393926019 |
Snapshot IDs sequence for the table: The snapshots
Relevant Trino code: trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Lines 1088 to 1113 in 3cbf498
Relevant Iceberg code: Relevant piece of information, for the
which means that the were actually However, the manifest file corresponding the new data file is missing 🙄 |
@findinpath I think it's probably a regression introduced in Iceberg 1.4. If it's the same issue, here's the relevant PR that should fix it: apache/iceberg#9230 |
@alexjo2144 posted a temporary fix for Trino -- #20159 |
It's disturbing that |
#20159 would fix the problem , but the current plan is to have a better fix (#20159 (comment)) |
#20207 should address the reported issue. |
We noticed a weird issue with Trino version 434 (latest). Some of our users do a lot of
INSERT INTO ...
in an iceberg table with Trino. However, when we changed the version from 428 to 434, seems like some inserts are ignored. It actually writes the file on the object storage, and commits a new snapshot, mentioning that it added X records (we get the same result from the stats in the Trino UI). However, when we query the table, it does not return any records. We tried to query the iceberg table with Spark as well, and it does not return anything.We analysed the metadata and manifest files, and we found out the following:
Until the sequence-number 15934, everything was fine. Then we have 2 inserts almost at the same time, that committed 2 new snapshots:
We can see a pattern on the metadata files where the prefix increments on every new snapshot, but in this case it is the same (not sure if it is expected). Anyway, if we open both metadata files, we can see the reference to the manifests (Avro files), and the latest metadata file includes the previous one, so I guess it could handle the concurrency.
29819-982db761-e158-42ee-8b5e-2c2ac92681ac.metadata.json
:29819-ccca636b-3279-4fe2-a733-bc3eed379312.metadata.json
:When we tried to check the Avro files, we could see the problem. For the commit before these 2 inserts, it has the following information about the sequence number 15934:
However, for the 2 commits we have this on each avro file:
gs://...b2a6894c12fb.avro
:gs://...5cfc3f50c6c9.avro
:For some reason, it doesn’t include a
-mX.avro
file for these inserts…The text was updated successfully, but these errors were encountered: