Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose which columns to store min/max values for #2709

Closed
kpapadatos opened this issue Jul 27, 2024 · 1 comment
Closed

Choose which columns to store min/max values for #2709

kpapadatos opened this issue Jul 27, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@kpapadatos
Copy link

Description

It would be useful to be able to choose which columns may be used for file skipping so that min/max values are not stored for unused columns.

Use Case
In our use case, we have some large string columns that are never used for file skipping, and keeping min/max values for those results in unnecessarily large delta log files.

Related Issue(s)
Delta log files get unnecessarily big.

@kpapadatos kpapadatos added the enhancement New feature or request label Jul 27, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jul 27, 2024

This is already supported with the configurations "delta.dataSkippingStatsColumns" or "delta.dataSkippingNumIndexedCols". See this PR #2428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants