Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support predicate pushdown and deletion for non-identity partition columns in Iceberg #7905

Closed
phd3 opened this issue May 13, 2021 · 3 comments · Fixed by #12795
Closed

Support predicate pushdown and deletion for non-identity partition columns in Iceberg #7905

phd3 opened this issue May 13, 2021 · 3 comments · Fixed by #12795
Labels
enhancement New feature or request performance
Milestone

Comments

@phd3
Copy link
Member

phd3 commented May 13, 2021

Currently, filters on non-identity partition columns are not utilized in Iceberg Connector predicate pushdown. Metadata deletion queries don't work on these:

CREATE TABLE customer_accounts (order_date DATE,account_number BIGINT) WITH (partitioning = ARRAY['month(order_date)'])

-- doesn't work
DELETE FROM customer_accounts WHERE date_trunc('month', order_date) = date_trunc('month', DATE '2018-06-01')

We can improve the SELECT queries for such cases by optimizing split scheduling. The constraint (with predicate) needs to be propagated to IcebergSplitManager through IcebergTableHandle and used to evaluate partition spec values inside FileScanTask to see if it matches the filter. However, it's currently not possible to tell the engine that such a filter is fully consumed. So supporting metadata delete in this case will require a bigger engine side change.

@phd3 phd3 changed the title Support filtering and deletion for non-identity Iceberg partition columns Support predicate pushdown and deletion for non-identity partition columns in Iceberg May 13, 2021
@phd3 phd3 mentioned this issue May 13, 2021
93 tasks
@findepi
Copy link
Member

findepi commented May 13, 2021

Requires #7608

@findepi
Copy link
Member

findepi commented May 25, 2022

We can improve the SELECT queries for such cases by optimizing split scheduling. The constraint (with predicate) needs to be propagated to IcebergSplitManager through IcebergTableHandle and used to evaluate partition spec values inside FileScanTask to see if it matches the filter.

Done in #9309 cc @homar

However, it's currently not possible to tell the engine that such a filter is fully consumed.

Besides this one, expression-based pushdown could help with subsequent dereference pushdown and also help with OPTIMIZE (#12362 cc @erichwang @losipiuk @alexjo2144 )

@kekwan
Copy link
Contributor

kekwan commented Sep 20, 2022

Is predicate pushdown for identity partition columns enabled already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Development

Successfully merging a pull request may close this issue.

3 participants