-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: better ways to use deletion vectors #215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first pass. Overall I think this is the right idea, but I'm not sold on the batch thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing first pass. Seems like an interesting start, but not sure I have a "big picture" view of the idea quite yet?
How selection vs. deletion vectors would work, how engine could provide its own bit vector implementation, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments but not much substance in the end... rust pointers are just messy no matter how you "slice" it :(
ffi/src/lib.rs
Outdated
@@ -122,6 +126,12 @@ mod private { | |||
len: usize, | |||
} | |||
|
|||
#[repr(C)] | |||
pub struct KernelRowIndexArray { | |||
ptr: *mut u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this accurate?
ptr: *mut u64, | |
ptr: NonNull<u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: this mirrors the existing KernelBoolSlice
, which uses a null pointer to represent the empty slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, according to std::slice::from_raw_parts:
You can obtain a pointer that is usable as data for zero-length slices using NonNull::dangling().
... that way, we wouldn't need any null check in the as_ref()
method below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs for Vec
don't specifically mention NonNull::dangling
, but Vec::as_ptr does return
a dangling raw pointer valid for zero sized reads if the vector didn’t allocate.
The docs for Vec::from_raw_slice are not clear tho:
ptr
must have been allocated using the global allocator, such as via the alloc::alloc function. ... These requirements are always upheld by anyptr
that has been allocated viaVec<T>
[which presumably includes the pointer returned byas_ptr
which we already trust for the non-empty case?]. Other allocation sources are allowed if the invariants are upheld.
Further, both Unique::dangling and NonNull::dangling state that they are
useful for initializing types which lazily allocate, like
Vec::new
does.
That said, it's probably safest of all to just use Vec::new().as_mut_ptr()
as pointer for an empty index slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the null pointer approach is simpler if the engine might ever need to create one of these to pass into kernel? Actually, given how messy all the non-null/unique/etc constraints are, nullable pointers are probably the simplest even if engine doesn't need to create one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's not clear to me what changes should be made? Should be switch both slice types over to a new way in a subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry that was a lot of rambling/exploring. Basically, we have two options for representing an empty slice:
- Pointer is always non-null, length 0 and dangling pointer means empty
- Null pointer means empty
Rust and its library classes don't have a really nice way to handle either case, so IMO we should choose the least-bad approach (= least code and least error prone) and use that consistently. Maybe the current code is already the correct choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the current code does both -- the slice -> vec code assumes nullable pointer, while the vec -> slice (impl From
) uses a dangling pointer. We should pick one and stick with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any opinion on which you prefer @scovich ? I'd probably go with the always non-null length zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated both bool and row id slice to use non-zero length as empty.
ffi/src/lib.rs
Outdated
let mut vec = value.row_indexes(); | ||
vec.shrink_to_fit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut vec = value.row_indexes(); | |
vec.shrink_to_fit(); | |
let vec = value.row_indexes(); |
Vec::into_boxed_slice states:
Before doing the conversion, this method discards excess capacity like shrink_to_fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: This feels like a very messy part of rust. There are so many ways to convert to and from pointers, and it's not clear that any of them is the "best" -- especially when bits don't fit together nicely. Like here -- into_raw
returns a pointer that satisfies NonNull
, but there's no "safe" way to actually create a NonNull
from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally fine, but we need to augment the actual ffi apis. So today we have selection_vector_from_dv
(in ffi/src/scan.rs), which always returns a bool slice.
we should probably add a row_indexes_from_dv
to mirror that.
Alternately, we could have a single function that can return whatever the user asks for, but while that could be clean in rust (return an enum
), I think over ffi it's probably easier/cleaner to just have multiple functions each with its own return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly good! I think there's still some stuff ryan pointed out that needs to be fixed, and I had one more thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating on this. just a couple of doc comments and we're good to go (i think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few nits - and can we update the PR description to accurately represent the changes?
ffi/src/lib.rs
Outdated
if let Some(ptr) = NonNull::new(ptr) { | ||
KernelBoolSlice { ptr, len } | ||
} else { | ||
KernelBoolSlice::empty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the leaked pointer can ever be null, so None case here is impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't believe so, I am just pleasing the compiler here because internally a vec that hasn't allocated also won't produce a null here and I can't think of any situation in which it would realistically be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a suitable From:
let ptr = Box::leak(boxed).into();
KernelBoolSlice { ptr, len }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping this would work, but we call .as_mut_ptr()
to get the pointer to the inner buffer of the vec, NonNull::from()
wants a &mut T
which I am not aware of anyway to get a mutable reference to the inner buffer (only a pointer) that doesn't involve dereferencing the pointer above, which would add unsafe to the call. I would prefer to keep the check above in lieu of adding unsafe to the method, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also keep in mind a lot of ways to refer to the vec in rust give pointers to the rust object of the vec, not the inner buffer, making this harder to get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! I had indeed forgotten the as_mut_ptr
was a vec method (wrongly assumed it was just turning the leaked reference into a pointer). Agree there doesn't seem to be any good way to directly extract a NonNull from a slice, even tho it's guaranteed to exist (grr). All the examples just just expect
to unwrap the option when passing a reference to NonNull::new
, should we do that?
(the current code is confusing because it uses an empty slice as a fallback for an error case that can't actually arise in practice -- which could trick some future reader into thinking empty slices need some kind of special handling there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility would be to leverage first_mut, but that's still annoying and ugly because it returns an Option<&mut T>
(because it's UB to create a reference from a zero-length slice). So expect
or the current code are probably the best we can do, barring a rust API change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expect should be okay because if it ever hits that error, we have a legitimate bug we should investigate.
Co-authored-by: Zach Schuermann <[email protected]>
…d of nullable pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks correct, remaining ugly seems to be a wart in rust, not easily fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is meant to be some utils to help make DV usage easier. You can
TODO: