-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Off-by-one buffer size error triggers Panic when constructing RecordBatch from IPC bytes (should return an Error) #3215
Comments
Hi @aarashy -- thank you for the report. The validation routine you point to is basically saying that it expects that the size of the underlying buffer for the BooleanArray is at least 321 bytes, but for some reason the data has only 320. I agree that the IPC reader should return the error rather than Not sure if you are up for proposing a PR to fix the issue (propagate the error) but if you are I would be most appreciative |
@alamb - I appreciate the prompt response. Do you think it would be crazy for However, there's a bigger question of why the data has only 320 instead of 321 in the first place. |
I don't think that would be crazy at all -- sounds like a good idea to me
👍 |
I removed the unwraps here #3232 I have some bytes which reproduce this error, but the data is private. The bytes were the result of the More info about the error stacktrace aboveLet me share a little bit more about the error in
Thus, I really want to understand why the buffer is given only 320 bytes. It seems like instead of correctly zero padding the 2565 bits to reach the next byte-boundary, the bits have truncated at the previous byte-boundary and the byte buffer is 1 byte too small. I'm struggling to really find the origin of the length of the Array data byte buffers here which would be performing this truncation... I suppose it would be Another similar error stacktraceI have another callstack which is throwing a similar panic on a different input bytes set. This one is from indexing into a slice out of bounds, again off-by-one.
In this case:
I wonder if @tustvold, @alamb, or @HaoYang670 would know anything about improperly truncated / padded byte buffers during IPC byte reading. It's possible that my input is missing some padding, but perhaps this crate should generously perform padding which the Javascript apache-arrow crate is neglecting. It's also possible that this is a Javascript arrow package bug, but if we have an opportunity to be more permissive and flexible here, we should take it (and we at least shouldn't have panics), |
If your IPC payload is generated by apache-arrow NPM package function |
@viirya , thanks for your helpful reply. I've created this issue in the arrow repo to track it there. (apache/arrow#14791) Now my last question is, are we okay with the fact that the It seems like, in situations where the input arrow bytes are the correct shape but have some subtle bug like this, |
I think the issue here is that currently
|
I don't think there are further action items for this ticket, so closing. Feel free to reopen if I have missed something |
|
Describe the bug
When reading Arrow Bytes, it seems boolean array inputs can trigger panics under certain conditions.
If my input bytes are bad, I want the arrow API to throw an
Err
, not panic.In this case, I don't think there's something wrong with my input bytes - rather, it seems like there's an off-by-one error internally within
arrow-rs
which is causing invariants to be broken.The error is being thrown in the
validate
routine:https://github.com/apache/arrow-rs/blob/master/arrow-data/src/data.rs#L667-L673
And it's being unwrapped in the caller in
create_primitive_array
, which DOES NOT return aResult
.https://github.com/apache/arrow-rs/blob/master/arrow-ipc/src/reader.rs#L487
Judging by the fact that
unwrap
is used, this function presupposes certain invariants that make it safe to useunwrap
, but for some inputs, these invariants seem to not be met.Since the problem appears to be with the length of a
buffer
, I'm tracing the origin of that buffer (per my stack trace) to https://github.com/apache/arrow-rs/blob/master/arrow-ipc/src/reader.rs#L1140 - I might be wrong.To Reproduce
I am using the following routine to read IPC bytes. I don't currently have an input
bytes
which triggers this, but I can find one if you think it's critical for you to debug this properly.Expected behavior
Either throw an error if there's a problem with my input bytes, or succeed if there is no problem with my input bytes, but do not panic.
Additional context
@alamb - You seem to have written these validation checks, so I wonder if you would understand what might be happening for me. Let me know if it's crucial for me to provide bytes inputs which trigger the panic, but I think the stacktrace and error message here might be enough to go off of.
The text was updated successfully, but these errors were encountered: