Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate file path found in the Iceberg metadata snapshot #295

Open
jbolesjc opened this issue Sep 19, 2024 · 1 comment
Open

Duplicate file path found in the Iceberg metadata snapshot #295

jbolesjc opened this issue Sep 19, 2024 · 1 comment

Comments

@jbolesjc
Copy link

jbolesjc commented Sep 19, 2024

We have our connectors running and sinking data to our Iceberg catalog in Glue/S3. However when trying to surface the data in Snowflake a few of these iceberg tables ran into this error from Snowflake.

Duplicate file path found in the Iceberg metadata snapshot. Please check that your Iceberg metadata generation is producing valid manifest files and refresh to a newer snapshot once fixed.

We are still trying to sort out where/why this is happening by combing through the manifest and snapshot files.

But looks like the tabular connector has created some invalid duplicates within the snapshot files.

@jbolesjc
Copy link
Author

jbolesjc commented Oct 1, 2024

Error in full:

Duplicate file path seen in the Iceberg metadata snapshot. Please check that your Iceberg metadata generation is producing valid manifest files and refresh to a newer snapshot once fixed. File path:'/catalog>/<table>/data/<filename>.parquet', SnapshotId: '<snapshot_ID>’.

Can confirm that the tabular connector is periodically writing out duplicate filepaths in the snapshots. I used the current manifest file and found the snapshot ID referenced in the error. This snapshot ID pointed to an avro file in it's "manifest-list" key. I opened that file and found 4 objects pointing to different metadata avro files. I opened the first one which had 4 objects, 2 sets of duplicates. One of the pairs pointed to the parquet file that was referenced in the error.

Tabular connector had written duplicate filepaths.

With snapshot retention set to a minimum of 1 day, that means whenever this happens my iceberg table will not be queryable for 24 hours.

This is a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant