-
Notifications
You must be signed in to change notification settings - Fork 821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove null_count
from ArrayData::try_new()
#911
Comments
If we need to implement this, I can have a try! |
I think it is still worth trying 👍 |
Something I find is that we have to move the check of pub unsafe fn new_unchecked(
data_type: DataType,
len: usize,
null_bit_buffer: Option<Buffer>,
offset: usize,
buffers: Vec<Buffer>,
child_data: Vec<ArrayData>,
) -> Self {
...
if let Some(null_bit_map) = null_bitmap.as_ref() {
let null_bit_buffer = null_bit_map.buffer_ref();
let needed_len = bit_util::ceil(len + offset, 8);
if null_bit_buffer.len() < needed_len {
return Err(ArrowError::InvalidArgumentError(format!(
"null_bit_buffer size too small. got {} needed {}",
null_bit_buffer.len(),
needed_len
)));
}
}
let null_count = count_nulls(null_bit_buffer.as_ref(), offset, len);
... Otherwise, we may panic if the |
I find the above bug in the test : #[test]
#[should_panic(expected = "null_bit_buffer size too small. got 1 needed 2")]
fn test_bitmap_too_small() {
let buffer = make_i32_buffer(9);
let null_bit_buffer = Buffer::from(vec![0b11111111]);
ArrayData::try_new(
DataType::Int32,
9,
Some(0),
Some(null_bit_buffer),
0,
vec![buffer],
vec![],
)
.unwrap();
} Here we will not calculate the null count in However, if we set |
Personally, my suggestions are:
|
cc @tustvold |
I think I'm missing something, new_unchecked is unsafe because it trusts the caller. Making it fallible / perform validation seems counter to that, and I'm not sure I follow why it is necessary? Isn't the issue in try_new? |
Sorry, my mistake. I think you are right. pub fn try_new(
...
) -> Result<Self> {
check_size_of_null_bit_buffer()?
let new_self = unsafe {
Self::new_unchecked(
...
)
};
// As the data is not trusted, do a full validation of its contents
new_self.validate_full()?;
Ok(new_self)
} |
I will fix this in #1707 |
🎉 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Re #817 ; As @jhorstmann pointed out on #810 at https://github.com/apache/arrow-rs/pull/810/files#r742905920
ArrayData::try_new()
function includes an optionalnull_count
argument and a validity buffer. If these numbers differ, wrong results can occur (and maybe also undefined behavior) in some later operation.Note that as part of #817 we will be validating that the bitmaps are consistent with the declared
null_count
Describe the solution you'd like
Since most callers pass
None
anyway at which point we calculate the number, if we simply removed the option to pass in an inconsistent value we would avoid a class of inconsistencies I would suggest to remove thenull_count
parameter fromtry_new
to completely avoid of inconsistencies.Describe alternatives you've considered
N/A
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: