-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix null bitmap length validation (#1231) #1232
Fix null bitmap length validation (#1231) #1232
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1232 +/- ##
==========================================
- Coverage 82.70% 82.70% -0.01%
==========================================
Files 175 175
Lines 51711 51713 +2
==========================================
+ Hits 42769 42770 +1
- Misses 8942 8943 +1
Continue to review full report at Codecov.
|
fn test_bitmap_too_small() { | ||
let buffer = make_i32_buffer(100); | ||
let buffer = make_i32_buffer(9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why changes (other than the message) are needed for this test.
The original formulation with 100 still fails even after this PR, but with a different error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused -- if the size calculation really was off, I would expect this code to fail ... But it passes just fine on master without the changes in this PR. So 🤔
#[test]
fn test_bitmap_size_ok_ok() {
let buffer = make_i32_buffer(8);
let null_bit_buffer = Buffer::from(vec![0b11111111]);
ArrayData::try_new(
DataType::Int32,
8,
Some(0),
Some(null_bit_buffer),
0,
vec![buffer],
vec![],
).unwrap();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've written make_i32_buffer(8);
not make_i32_buffer(9);
?
The changes are necessary to make this test more restrictive, previously it passed because 100 rows needs 13 bytes which is still more than the 8 bytes it thinks are in the null buffer (when actually there is only 1 byte and 8 bits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've written make_i32_buffer(8); not make_i32_buffer(9); ?
Yes, because I am making a 8
element array of Int32
s.
What I was trying to do was find a test that shows incorrect behavior on master but passes with this PR. The test you have modified passed for me on master (with a different error message).
Is there a test that we can write that would have failed (validate() would pass on master but not on this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test as in the PR will not fail on master on account of the null bitmask length, it will in fact fail because it reads beyond the bounds of the null bit buffer and thinks there is a null bit 😱
The below passes validation on master
fn test_bitmap_too_small() {
let buffer = make_i32_buffer(9);
let null_bit_buffer = Buffer::from(vec![0b11111111]);
ArrayData::try_new(
DataType::Int32,
9,
Some(1), // OUT-OF-BOUNDS READ
Some(null_bit_buffer),
0,
vec![buffer],
vec![],
)
.unwrap();
}
@@ -650,7 +650,8 @@ impl ArrayData { | |||
} | |||
|
|||
// check null bit buffer size | |||
if let Some(null_bit_buffer) = self.null_bitmap.as_ref() { | |||
if let Some(null_bit_map) = self.null_bitmap.as_ref() { | |||
let null_bit_buffer = null_bit_map.buffer_ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change -- the length of the BitMap
is in bits
which seems like it is properly handled in bit_util::ceil()
on the next linel
If you change this to use the underlying buffer, then the length is in bytes, so then the division should also be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ceil
computes the number of bytes from len_plus_offset and then compares this against the number of bits in the bitmap. This is the issue. In particular null_bit_buffer.len()
previously returned the length in bits as that is what Bitmap::len
returns. With this PR it instead calls Buffer::len
which returns the length in bytes.
FWIW I filed #1233 as this is really confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha -- I finally got it to fail on master with MIRI
test array::data::tests::test_bitmap_too_small_invalid ... error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
--> arrow/src/util/bit_chunk_iterator.rs:94:25
|
94 | bits |= (byte as u64) << (i * 8 - bit_offset);
| ^^^^^^^^^^^^^ using uninitialized data, but this operation requires initialized memory
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `util::bit_chunk_iterator::BitChunks::remainder_bits` at arrow/src/util/bit_chunk_iterator.rs:94:25
note: inside `buffer::immutable::Buffer::count_set_bits_offset` at arrow/src/buffer/immutable.rs:210:18
--> arrow/src/buffer/immutable.rs:210:18
|
210 | count += chunks.remainder_bits().count_ones() as usize;
| ^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::count_nulls` at arrow/src/array/data.rs:43:25
--> arrow/src/array/data.rs:43:25
|
43 | len.checked_sub(buf.count_set_bits_offset(offset, len))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::ArrayData::validate_full` at arrow/src/array/data.rs:931:33
--> arrow/src/array/data.rs:931:33
|
931 | let actual_null_count = count_nulls(null_bitmap_buffer, self.offset, self.len);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::ArrayData::try_new` at arrow/src/array/data.rs:344:9
--> arrow/src/array/data.rs:344:9
|
344 | new_self.validate_full()?;
| ^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::tests::test_bitmap_too_small_invalid` at arrow/src/array/data.rs:1720:9
--> arrow/src/array/data.rs:1720:9
|
1720 | / ArrayData::try_new(
1721 | | DataType::Int32,
1722 | | 9,
1723 | | Some(1), // OUT-OF-BOUNDS READ
... |
1727 | | vec![],
1728 | | )
| |_________^
note: inside closure at arrow/src/array/data.rs:1716:5
--> arrow/src/array/data.rs:1716:5
|
1715 | #[test]
| ------- in this procedural macro expansion
1716 | / fn test_bitmap_too_small_invalid() {
1717 | | let buffer = make_i32_buffer(9);
1718 | | let null_bit_buffer = Buffer::from(vec![0b11111111]);
1719 | |
... |
1729 | | .unwrap();
1730 | | }
| |_____^
= note: inside `<[closure@arrow/src/array/data.rs:1716:5: 1730:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:574:5
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:30
= note: inside `<[closure@test::run_test::{closure#1}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1854:9
= note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:271:9
= note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
= note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
= note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
= note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:597:18
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:491:39
= note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:529:13
= note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:561:28
= note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:304:17
= note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
= note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:115:15
= note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:134:5
= note: inside `main`
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:123:18
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:145:18
= note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
= note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
= note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
= note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:128:48
= note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
= note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
= note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
= note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:128:20
= note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:144:17
= note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to previous error
error: test failed, to rerun pass '-p arrow --lib'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make the docs clearer in #1237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that the test in this PR fails MIRI on master.
Thanks for bearing with me @tustvold
Which issue does this PR close?
Closes #1231.
Rationale for this change
See ticket
What changes are included in this PR?
Fixes the null bitmap length validation
Are there any user-facing changes?
Previously invalid data will now be caught by validation