You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, filters on non-identity partition columns are not utilized in Iceberg Connector predicate pushdown. Metadata deletion queries don't work on these:
CREATE TABLE customer_accounts (order_date DATE,account_number BIGINT) WITH (partitioning = ARRAY['month(order_date)'])
-- doesn't work
DELETE FROM customer_accounts WHERE date_trunc('month', order_date) = date_trunc('month', DATE '2018-06-01')
We can improve the SELECT queries for such cases by optimizing split scheduling. The constraint (with predicate) needs to be propagated to IcebergSplitManager through IcebergTableHandle and used to evaluate partition spec values inside FileScanTask to see if it matches the filter. However, it's currently not possible to tell the engine that such a filter is fully consumed. So supporting metadata delete in this case will require a bigger engine side change.
The text was updated successfully, but these errors were encountered:
phd3
changed the title
Support filtering and deletion for non-identity Iceberg partition columns
Support predicate pushdown and deletion for non-identity partition columns in Iceberg
May 13, 2021
We can improve the SELECT queries for such cases by optimizing split scheduling. The constraint (with predicate) needs to be propagated to IcebergSplitManager through IcebergTableHandle and used to evaluate partition spec values inside FileScanTask to see if it matches the filter.
However, it's currently not possible to tell the engine that such a filter is fully consumed.
Besides this one, expression-based pushdown could help with subsequent dereference pushdown and also help with OPTIMIZE (#12362 cc @erichwang@losipiuk@alexjo2144 )
Currently, filters on non-identity partition columns are not utilized in Iceberg Connector predicate pushdown. Metadata deletion queries don't work on these:
We can improve the
SELECT
queries for such cases by optimizing split scheduling. The constraint (with predicate) needs to be propagated toIcebergSplitManager
throughIcebergTableHandle
and used to evaluate partition spec values insideFileScanTask
to see if it matches the filter. However, it's currently not possible to tell the engine that such a filter is fully consumed. So supporting metadata delete in this case will require a bigger engine side change.The text was updated successfully, but these errors were encountered: