Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine the bit_util of Parquet. #1905

Merged
merged 6 commits into from
Jun 19, 2022

Conversation

HaoYang670
Copy link
Contributor

Which issue does this PR close?

Closes #1901 .

What changes are included in this PR?

  1. Remove the function log2, because:
  • The ceiling of log2 (n) == num_required_bits(n - 1)
  • log2 is unsafe, which panics when the input number is zero.
  1. Change the return type of num_required_bits to u8
  2. Use the std function div_ceil
  3. optimize trailing_bits

Are there any user-facing changes?

Yes.

Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 18, 2022
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
@codecov-commenter
Copy link

Codecov Report

Merging #1905 (153f4d2) into master (fc4044f) will decrease coverage by 0.01%.
The diff coverage is 94.44%.

@@            Coverage Diff             @@
##           master    #1905      +/-   ##
==========================================
- Coverage   83.43%   83.41%   -0.02%     
==========================================
  Files         203      214      +11     
  Lines       56969    56988      +19     
==========================================
+ Hits        47531    47538       +7     
- Misses       9438     9450      +12     
Impacted Files Coverage Δ
parquet/src/column/reader.rs 68.61% <0.00%> (ø)
parquet/src/column/reader/decoder.rs 76.27% <100.00%> (ø)
parquet/src/encodings/encoding.rs 93.62% <100.00%> (-0.03%) ⬇️
parquet/src/encodings/levels.rs 93.41% <100.00%> (ø)
parquet/src/encodings/rle.rs 92.55% <100.00%> (ø)
parquet/src/util/bit_util.rs 92.68% <100.00%> (-0.33%) ⬇️
parquet_derive/src/parquet_field.rs 65.75% <0.00%> (-0.46%) ⬇️
arrow/src/datatypes/datatype.rs 65.42% <0.00%> (-0.38%) ⬇️
arrow/src/compute/kernels/temporal.rs 94.64% <0.00%> (ø)
arrow/src/array/builder.rs
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc4044f...153f4d2. Read the comment docs.

Copy link
Contributor

@jhorstmann jhorstmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, nice improvement!

if num_bits >= 64 {
return v;
v
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this if block is actually necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1<<num_bits would overflow if num_bits >= 64

Copy link
Contributor

@tustvold tustvold Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah yes, forgot you can't just overflow. I presume LLVM is smart enough to turn the if into a mask, i.e.

1 << (num_bits & 64)

This comment was marked as abuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple test:

    #[test]
    fn test() {
        fn do_test(shift: i32) {
            let i = 1_u64;
            println!("{}", i << shift);
        }
        do_test(65);
    }

Output:

panicked at 'attempt to shift left with overflow'

parquet/src/encodings/encoding.rs Show resolved Hide resolved
@tustvold tustvold merged commit 1f65791 into apache:master Jun 19, 2022
@HaoYang670 HaoYang670 deleted the refine_parquet_bit_util branch June 19, 2022 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

log2(0) panicked at 'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5
4 participants