Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Iceberg partition predicate enforcement #13239

Merged
merged 1 commit into from
Aug 1, 2022

Conversation

alexjo2144
Copy link
Member

Description

Only partition specs which are used by the Snapshot being queried are relevant when determining if partitions can be used to enforce a Predicate.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Improve filtering on partition columns in Iceberg

Related issues, pull requests, and links

Relates to: #12795

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Jul 19, 2022
@alexjo2144 alexjo2144 force-pushed the iceberg/predicate-pushdown branch 2 times, most recently from 4b74206 to e27be8b Compare July 19, 2022 22:13
Copy link
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me

@@ -1769,6 +1769,10 @@ public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(C
else {
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName());

Set<Integer> partitionSpecIds = icebergTable.snapshot(table.getSnapshotId().orElseThrow()).allManifests().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call me ignorant, but under what conditions would table.getSnapshotId() be empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It's only ever empty during new table creation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a message so that -- if it throws -- we know what assumption was violated

snapshotId = table.getSnapshotId().orElseThrow(() -> new IllegalStateException("no snapshot id"))

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM expect for the call to allManifests().

@@ -1769,6 +1769,10 @@ public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(C
else {
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName());

Set<Integer> partitionSpecIds = icebergTable.snapshot(table.getSnapshotId().orElseThrow()).allManifests().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a message so that -- if it throws -- we know what assumption was violated

snapshotId = table.getSnapshotId().orElseThrow(() -> new IllegalStateException("no snapshot id"))

@@ -1769,6 +1769,10 @@ public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(C
else {
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName());

Set<Integer> partitionSpecIds = icebergTable.snapshot(table.getSnapshotId().orElseThrow()).allManifests().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think calling allManifests can be expensive, right?

can we prune them out using the predicates we already know + the new ones?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this list doesn't require reading all of the individual Manifests, just the ManifestList, so it's one Avro file to read

@alexjo2144
Copy link
Member Author

@findepi AC, thanks. Had to make one more small change to TestIcebergProjectionPushdownPlans. Since we're now enforcing all constraints on empty tables I had to add data to the table so that we can test enforced and unenforced constraints.

@@ -129,7 +129,7 @@ public void testDereferencePushdown()

getQueryRunner().execute(format(
"CREATE TABLE %s (col0, col1) WITH (partitioning = ARRAY['col1']) AS" +
" SELECT CAST(row(5, 6) AS row(x bigint, y bigint)) AS col0, 5 AS col1 WHERE false",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an implicit change here that we can now enforce all predicates on empty tables. That broke this test because the table is empty but was not expecting a predicate to be enforced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't it mean that now iceberg will fail on valid sql ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with Konrad offline about this, not a problem

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please squash

Only partition specs which are used by the Snapshot being
queried are relevant when determining if partitions can be
used to enforce a Predicate.
@alexjo2144
Copy link
Member Author

Squashed

@findepi findepi merged commit 49d8e11 into trinodb:master Aug 1, 2022
@findepi findepi added the no-release-notes This pull request does not require release notes entry label Aug 1, 2022
@github-actions github-actions bot added this to the 392 milestone Aug 1, 2022
@alexjo2144 alexjo2144 deleted the iceberg/predicate-pushdown branch August 2, 2022 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

5 participants