Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix applyFilter when an Iceberg table does not have any snapshots #13576

Merged

Conversation

alexjo2144
Copy link
Member

Description

Tables created by Spark may not have a snapshot committed if they are newly created, empty tables.

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Fix handling of querying newly created, empty tables.

Related issues, pull requests, and links

Introduced by: #13239

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Fix querying Iceberg tables which are empty and contain no table history.

@cla-bot cla-bot bot added the cla-signed label Aug 9, 2022
@alexjo2144 alexjo2144 self-assigned this Aug 9, 2022
@alexjo2144 alexjo2144 added the bug Something isn't working label Aug 9, 2022
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tables created by Spark may not have a snapshot committed if they are newly created, empty tables.

Does it indicate that it would be better to fix CREATE TABLE logic in the connector?

@findepi
Copy link
Member

findepi commented Aug 10, 2022

Does it indicate that it would be better to fix CREATE TABLE logic in the connector?

This would have the benefit of improving test coverage, and support, for this case.

.map(ManifestFile::partitionSpecId)
.collect(toImmutableSet());
.collect(toImmutableSet()))
.orElseGet(() -> ImmutableSet.copyOf(icebergTable.specs().keySet()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.orElseGet(() -> ImmutableSet.copyOf(icebergTable.specs().keySet()));
// No snapshot, so no data. This case doesn't matter.
.orElseGet(() -> ImmutableSet.copyOf(icebergTable.specs().keySet()));

Tables created by Spark may not have a snapshot committed if
they are newly created, empty tables.
@findepi findepi force-pushed the iceberg/handle-missing-snapshot-id branch from 13212c5 to 7190dae Compare August 10, 2022 08:15
@findepi findepi merged commit a94edd7 into trinodb:master Aug 10, 2022
@findepi findepi mentioned this pull request Aug 10, 2022
@github-actions github-actions bot added this to the 393 milestone Aug 10, 2022
@ebyhr
Copy link
Member

ebyhr commented Aug 10, 2022

This would have the benefit of improving test coverage, and support, for this case.

Sorry, my comment was unclear. My question was if we will fix Iceberg CREATE TABLE logic to make it the same behavior with Spark after merging this PR as-is.

@alexjo2144 alexjo2144 deleted the iceberg/handle-missing-snapshot-id branch August 10, 2022 17:32
@alexjo2144
Copy link
Member Author

Both are valid, going strictly by the specification. I don't know that matching the Spark behavior buys us anything besides making it easier to test this edge case where there are not snapshots in the history.

@findepi
Copy link
Member

findepi commented Aug 12, 2022

matching the Spark behavior buys us anything besides making it easier to test this edge case

that's the sole reason I'd consider this doing for.

however, i actually value the fact that table history (snapshots) contains entry for initial empty state. This indeed is part of the table history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed
Development

Successfully merging this pull request may close these issues.

3 participants