-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ArrayData::new()
with ArrayData::try_new()
and unsafe ArrayData::new_unchecked
#822
Conversation
/// contents of the buffers (e.g. that string offsets for UTF8 arrays | ||
/// are within the length of the buffer). | ||
pub fn validate(&self) -> Result<()> { | ||
// will be filled in a subsequent PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#810 has an initial set of checks
@@ -264,6 +278,53 @@ impl ArrayData { | |||
} | |||
} | |||
|
|||
/// Create a new ArrayData, validating that the provided buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API changes in data.rs
are the core changes in this PR -- everything else is a mechanical changes to use the new APIs.
Codecov Report
@@ Coverage Diff @@
## master #822 +/- ##
==========================================
+ Coverage 82.54% 82.56% +0.02%
==========================================
Files 168 168
Lines 47910 47988 +78
==========================================
+ Hits 39545 39622 +77
- Misses 8365 8366 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes look good, thanks a lot for investing the time in this @alamb
I think we should have some notes around the usages of unsafe: why is it sound in this place?
arrow/benches/array_from_vec.rs
Outdated
let arr_data = ArrayDataBuilder::new(DataType::Int32) | ||
.add_buffer(Buffer::from(v)) | ||
.build(); | ||
let arr_data = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a safety note here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to note that this is (very old) benchmark crate seemingly last modified by @andygrove and @sunchao https://github.com/apache/arrow-rs/blame/master/arrow/benches/array_from_vec.rs#L29-L38
To be honest it does not look safe to me. I will try and rewrite it
It is a good idea @Dandandan -- I will attempt to do so. To be honest I am not sure why all the references are legitimate uses of I don't think this PR has made the code any more or less safe / unsafe than it was before. However, it is now clearer where assumptions are made (they are all annotated with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me as well and I agree with @Dandandan that we should add safety notes.
@Dandandan and @houqp -- First I want to emphasize that this PR does not change the safety of the arrow-rs implementation -- the code is as safe/unsafe before this PR as it is after this PR. I agree that all However, I propose not requiring such annotations for this PR because:
Thus, I propose a multi-pronged approach:
I will admit that part of my reason for not wanting to try and annotate all uses of |
I agree with you @alamb 👍 |
Tried adding some safety notes, I have to agree that the task is quite intense :D I also suggest we focus on filling gaps in arrow2 instead of retrofitting arrow-rs, it's a huge undertaking that @jorgecarleitao has already tried and decided that the effort is not worth it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dandandan -- are you OK with merging this PR given the rationale listed in #822 (comment) (you have marked the PR as changes requested)
@jhorstmann are you OK with this PR?
I would like to merge this in and then create an arrow 6.0.0 release candidate (and hopefully unblock the next downstream release of DataFusion)
I plan to make ArrayData::try_new()
safer with additional validation (released as part of 6.1.0)
I agree with your points. The PR already improves on the current state ( |
Hehe 😃 I was just looking at it. Yeah merging as is would be great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also looked at it, and agree with merging as is
Given the feedback I plan to rebase this and merge it in |
d54bd7d
to
f85fff6
Compare
Thanks @alamb! I was away from a computer for a few days, this looks good! |
Which issue does this PR close?
Part of #817
Rationale for this change
This PR is a step towards making arrow-rs Rust
safe
and resolving open RUSTSEC issues.ArrayData::new()
is fundamentallyunsafe
(in the Rust sense) as it relies on the user to pass in valid data or else allows undefined behavior. The API is easy to misuse and should be marked asunsafe
to reflect this. See Validate arguments to ArrayData::try_new() #817 for more background.Builds on @jhorstmann 's work in #813
What changes are included in this PR?
ArrayData::new()
unsafe ArrayData::new_unchecked()
andArrayData::try_new()
ArrayDataBuilder::build()
fallibleunsafe ArrayDataBuilder::build_unchecked()
Note:
** Splitting the changes into several PRs I think will help with reviews
** I would like to ensure the API changes are included it arrow-rs 6.0 (planning to make a release candidate in the next week or so). We can then add additional validation in 6.1, 6.2, etc as they will be non breaking API changes.
Are there any user-facing changes?
Yes -- the APIs for creating ArrayData are different. This should not affect any users who create Arrays directly, only those using the lower level APIs.