Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data skipping predicates should support column name mapping #433

Open
scovich opened this issue Oct 28, 2024 · 0 comments
Open

Data skipping predicates should support column name mapping #433

scovich opened this issue Oct 28, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@scovich
Copy link
Collaborator

scovich commented Oct 28, 2024

Please describe why this is necessary.

Today, data skipping predicates (containing logical column names) are evaluated as-is against both Delta stats and parquet row group stats -- both of which use physical column names. When a table uses column name mapping, the predicates are useless because they reference columns that seem to not exist. Currently we don't validate that the column actually exists (which is probably a bug in its own right), so this silently manifests as a complete lack of data skipping.

Describe the functionality you are proposing.

We need to rewrite logical column name references in data skipping predicates to use their physical counterparts.

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant