-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace MemoryBlock
with NonNull<[u8]>
#61
Comments
This makes the struct less repr(C) friendly, so that's a major downside. |
Why would |
Much easier to interact with foreign code if our allocation system can actually have its primitives sent directly to said foreign code. for plain data structs, repr(C) is a very sane and good default to follow. We don't need any fancy field reordering here. |
I see the argument, why |
I am not on T-Lang or the UCG-wg, but they're both having meetings on Thursday and they're open to the public. I'll try to ask about it if I get a chance, but I expect the answer is "no plans at this time". |
|
Oh, I missed that there's |
What I really like on
Couldn't you just implement an extension trait for @SimonSapin What you would pass back to the allocator? |
I'd be totally fine with this proposal if there was a defined and stable layout for the type, but until then we should not change from a simple struct that's easy to have interact with C to a more "rusty" type that makes it harder to interact with C. Because it's not just C that we're talking about of course, this affects Rust being used in Python, and Ruby, and Erlang and all those other things. |
I didn't mean to pass it directly but use a wrapper struct (like |
I suppose, still feels kinda clunky, but sure. |
A bit off topic, but is the This is more a question for wrapping jemalloc etc, which have an extra function call associated with that. I don't think those wrappers will be able to optimise out an extern "C" call (although static jemalloc with LTO?). For pure Rust allocators you can always just structure your code not to forget the usable size. (Edit: it's also my understanding that eventually, the list of |
@cormacrelf This issue is about what type to use in order to represent a pointer and size together. It’s independent of whether and when to do that in the first place, as opposed to only returning a pointer. Always returning the usable size was proposed in #17 (comment). I think how optimization interacts with the global allocator indirection is interesting, please consider filing a new issue.
I don’t remember the exact scenario I tried, but quite some time ago I observed that |
I've given this a little more thought and actually feel like A This is indeed a familiar pattern. Rust came up with slices for this case: I recognise that we can use the internal implementation of slices as a fat pointer to benefit from the apparent equality between If we are to follow this path, we should imo hide the implementation (potentially behind a W.r.t. C ABI compatibility, I don't think this is an issue. A big shortcoming of C is the inability to specify the length alongside a pointer. That's why you often pass in a |
A problem with While normally more types to encode semantics are a good thing. In this case because low-level memory allocation is via opaque, type-erased, API's I think it could make more sense to just accept that and make it clear that the callee should just trust the allocator to implement the API correctly. Notably, Thus I'd suggest we instead return EDIT: wait, sorry, I somehow totally forgot that |
@Wodann Sorry, I don’t understand what your point is.
How they semantically not the same? They both describe a region of memory by its starting address and size/length.
That’s mostly just another name for @petertodd The problem with returning a pointer to something that has As to returning a size, yes as your edit says it’s to allow the allocator to return more than requested. If I ask for 100 bytes, jemalloc will usually return 128 bytes. As to whether returning a size should be part of the "default" APIs, that was proposed in #17 (comment) and off-topic for this thread. This thread discusses how to represent a size together with a pointer, assuming we want to include it. |
@SimonSapin As I mentioned in my post
As I mentioned in my post: A Even though it encodes the same information, this communicates something completely different to the end user. Namely: "You are returned a slice (aka a pointer to memory plus a size)" vs "You are returned a mutable pointer to a slice" Why this is an issue? Intuitively that would make me believe that I still need to follow a pointer to get the actual address of the memory block. |
Ok I think there is some confusion here due to our collective inconsistent use of the word "slice" to mean sometimes
All kinds of references and pointers to a given type
All of these can be combined in almost any way:
Once upon a time (before Rust 1.0 I think?) dynamically-sized types did not exist in the language. Today still we tend to casually refer for example to
Honestly I’m not sure which of these two is If we assume my definitions above, your description "a pointer to memory plus a size" makes it clear that you mean "casual" use of the word slice, with one level of indirection rather than zero. (Zero levels would be a And "a mutable pointer to a slice" fits
That would be two levels of indirection like To some extent I think we can help clear this confusion with more docs and teaching, but it’s a fair point that raw (pointers to) slices |
Perhaps a compromise might be a MemoryBlock type with:
|
Another option is a typedef type MemoryBlock = NonNull<[u8]>; |
I would still much favor a dedicated struct that can be repr(C). |
The premise of this proposal is that adding a new struct adds API surface and therefore complexity. In this case this may not be necessary since the language already has an equivalent type. Adding conversion A typedef adds less API surface than a struct, but I’m not sure what is the point of it existing. The
|
Well, the point of all of this is to unify the ecosystem. Nothing here is in any way enabling people to write programs they can't write today. We're just picking conventions for the ecosystem to follow. So starting my logic from there, I think that it would be a better situation for Rust to pick a convention that is more friendly by default to being called from other hosted languages that might want to call into a native language, such as Python, Erlang/Elixir, etc. Rust already has some success stories with being the native lang of choice for hosted languages and we should try to continue the trend. Being able to return an allocator's output directly to foreign code, instead of each allocator or foreign framework having its own idea of what a rust allocation translates to, would help maintain ecosystem unity in this area. But I guess if allocators have to use Result then we're already well past being able to easily have a |
@SimonSapin thanks for the elaborate explanation, and apologies if my point wasn't clear. I'll try to clarify my previous argument. A pub struct NonNull<T: ?Sized> {
pointer: *const T,
}
impl<T: ?Sized> NonNull<T> {
/// Creates a new `NonNull` if `ptr` is non-null.
#[stable(feature = "nonnull", since = "1.25.0")]
#[inline]
pub fn new(ptr: *mut T) -> Option<Self> {
if !ptr.is_null() { Some(unsafe { Self::new_unchecked(ptr) }) } else { None }
}
/// Acquires the underlying `*mut` pointer.
#[stable(feature = "nonnull", since = "1.25.0")]
#[rustc_const_stable(feature = "const_nonnull_as_ptr", since = "1.32.0")]
#[inline]
pub const fn as_ptr(self) -> *mut T {
self.pointer as *mut T
}
...
} In your case ( As you pointed out
This means that You already proposed
The same fix won't work for obtaining the sized of the memory block, as evident from the pub const fn len(self) -> usize {
self.as_ptr().len()
} It still requires an additional layer of indirection for accessing the size of a memory block. Please tell me if it's still not clear what I am trying to explain, or if I am overlooking some implementation detail that would not make it two memory accesses. |
|
In particular, |
Also note that |
They don't have to use As @SimonSapin mentioned, you have to handle the return value anyway. Any FFI-API differs in it's required parameters and return values, so there is no variant to rule them all. When I'd have to use the allocator-api in an FFI-environment, I'd simply use an extension trait, which implements on every |
I think this holds for every trait, not only for |
No no, the general contract of Rust as a whole is that |
You’re using "levels of indirection" to count the number of method calls in a Rust expression, whereas my earlier comment used it to to count the number of pointer that can be followed in a "chain" ( As others have mentioned, some of these methods do type conversion but effectively do nothing at run-time. In the compiled assembly, |
If I understand correctly from @Amanieu (thanks!), the compiler does an optimisation that turns
In that case I see how there would be only one layer or indirection. For clarity, my understanding was as follows:
vs
Just to clear up the last misunderstanding, I was also referring to chasing pointers since If the function explanation contained information about If anything, I feel like the discussion and misunderstanding we've had about implementation details of I think it will be easier to document if we have a dedicated type (alias), as it centralises this critical documentation. If not, I am afraid that docs will blow up + make maintainance harder - if we need to copy the same docs to all functions that return P.s. for the record, my prior concern that
is probably a non-concern as fat pointers and their optimisations are seemingly well established. |
I took time at #61 (comment) to make a detailed explanation of the exact same facts and reasoning, arriving at the same conclusion. It seems I failed to get something across :/ |
For me, the missing part was:
I didn't realise that rustc optimised that. |
It’s not an optimization. There isn’t an extra level of pointer indirection that was magically removed. It’s just what it means to have |
Okay, I see now. Your linked
The data is encoded directly in a Thanks for bearing with me 😅 |
(Note that using a |
As we are on nightly and there will be another breaking change to the allocator api anyway (#58), I'd like to try
The only thing we lose is |
( |
reminder for the record: returning |
That would be an option for APIs where no size is part of the return value, but #17 (comment) decided to always return the usable size. |
Also: (Allocation) Failure is not an Option 😉 |
…nieu Replace `Memoryblock` with `NonNull<[u8]>` Closes rust-lang/wg-allocators#61 r? @Amanieu
…nieu Replace `Memoryblock` with `NonNull<[u8]>` Closes rust-lang/wg-allocators#61 r? @Amanieu
I wonder if |
@zakarumych Closed issues are generally not a great place to ask a question since they’re considered resolved and not in need of attention anymore.
|
@SimonSapin thanks for clarification. Sorry for disturbance. |
The return type some methods involves:
A pointer and a size, that sounds familiar.
Perhaps a dedicated struct is not needed and this could be
NonNull<[u8]>
instead?NonNull::cast
can be used to cast to any thin pointer type, similar to accessing theMemoryBlock::ptr
field. PR rust-lang/rust#71940 proposesNonNull<[T]>::len
for accessing the size, andNonNull<[T]>::slice_from_raw_parts
for constructing a new value. (I feel that these would be good to have regardless of what happens toMemoryBlock
, and I expect their existence be uncontroversial given the precedent of<*const [T]>::len
andptr::slice_from_raw_parts
.)The text was updated successfully, but these errors were encountered: