-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing struct stats after schema evolution #901
Merged
wjones127
merged 9 commits into
delta-io:main
from
Tom-Newton:tomnewton/fix_warning_spam_for_struct_stats
Nov 3, 2022
Merged
Fix parsing struct stats after schema evolution #901
wjones127
merged 9 commits into
delta-io:main
from
Tom-Newton:tomnewton/fix_warning_spam_for_struct_stats
Nov 3, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Tom-Newton
changed the title
Fix warning spam when parsing struct stats after schema evolution
Fix parsing struct stats after schema evolution
Oct 24, 2022
wjones127
reviewed
Oct 25, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a draft, but thought I might leave some early feedback.
Co-authored-by: Will Jones <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Co-authored-by: Will Jones <[email protected]>
refactor: simplify logic for parsing struct stats
Tom-Newton
requested review from
houqp,
xianwill,
fvaleye,
roeap,
rtyler and
mosyp
as code owners
November 3, 2022 08:52
wjones127
approved these changes
Nov 3, 2022
Thanks @Tom-Newton |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When a delta table's schema is evolved the struct stat schemas in checkpoints are also evolved. Since the struct stats are stored in a columnar way adding a single file with the new columns will cause nulls to appear in the struct stats for all other files. This is a significant difference compared to the json stats.
Unfortunately I overlooked this in #656 for both nullCounts and min/max values. This caused parsed struct stats to have extra columns full of nulls. I don't know if this was actually an issue at all but it should be fixed even if just for the sake of the warnings spam.
Related Issue(s)
Changes:
Usual disclaimer on a PR from me: I don't know what I'm doing writing rust code. (thanks to wjones for tidying up my dodgy rust code 🙂)