-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix disabling parquet statistics (#2185) #2191
Conversation
@@ -456,7 +456,8 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> { | |||
for &level in levels { | |||
if level == self.descr.max_def_level() { | |||
values_to_write += 1; | |||
} else if self.statistics_enabled == EnabledStatistics::Page { | |||
} else { | |||
// We must always compute this as it is used to populate v2 pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this would cause an incorrect null count in the data page v2 header
.build()?; | ||
.set_dictionary_page_offset(dict_page_offset); | ||
|
||
if self.statistics_enabled != EnabledStatistics::None { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we would still put the column statistics, even if we hadn't actually computed them 😱
statistics, | ||
.. | ||
} => { | ||
assert_eq!(*num_values, 6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusingly in parquet num_values includes nulls, despite null values not actually be encoded 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also verified that the newly added test does indeed fail without the code changes in this PR
---- column::writer::tests::test_disabled_statistics stdout ----
thread 'column::writer::tests::test_disabled_statistics' panicked at 'assertion failed: statistics.is_none()', parquet/src/column/writer/mod.rs:1689:17
stack backtrace:
🚢 🇮🇹
Benchmark runs are scheduled for baseline = 445283c and contender = 82e0512. 82e0512 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2185
Rationale for this change
What changes are included in this PR?
Fixes the handling of this option, and adds some test coverage 😄
Are there any user-facing changes?
This feature should now behave correctly