-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make force_ptr a total operation, and ptr-to-int-casts not require extra machine state #841
Comments
🤣 do it. Long version: There are two kinds of Now if we suddenly get many many more integer constants (non-usize) due to const generics this may become an issue. cc @eddyb @varkor We've gone back and forth on this topic. I don't think it's a problem, it's more like we're moving from a local minimum to another. But if we get something out of it, let's do it. |
Can we keep |
Well, it's specifically |
EDIT: but I'd prefer if we kept the full |
So you are saying that That makes naming annoying though, indeed. One crucial property of |
What would be even better is if we could somehow move from "either When defining a formal memory model with int-ptr-casts, the best choice (in my opinion) is to say that a
So if we had just Miri to consider, I would lobby heavily for doing it this way. It'll make things much simpler. But we also have CTFE to consider, and CTFE does not want to support ptr-int-casts. So if we want to go with this strategy, we need some way to abstract this that is not messy. I'll think about this. Any proposals? |
we could eagerly create integers by checking a flag on the engine. We'd not have the additional provenance of the |
Well also CTFE would then have to pick a base address, which does not seem great. What about (something like) this: we define struct Scalar<Tag> {
/// The size of this scalar, in bytes.
size: u8,
/// The raw data of this scalar. All but the least significant `size` bytes are 0.
raw: u128,
/// If this is a pointer, the allocation it points to, and further machine-dependent provenance.
ptr_provenance: Option<(AllocId, Tag)>,
} (Maybe we can save some space by e.g. exploiting that a Then we can have a machine function that can turn a We would also have a machine function that can turn a The benefit is basically what I laid out above. In memory, the The risk here is that we must make sure that nothing in CTFE thinks the So, taking this into account, maybe /// Look ma, the fields are private!
struct Scalar<PtrData> {
/// The size of this scalar, in bytes.
size: u8,
/// The raw data of this scalar. All but the least significant `size` bytes are 0.
raw: u128,
/// If this is a pointer, the machine adds some provenance.
ptr_data: Option<PtrData>,
}
impl<PtrData: PtrProvenance> Scalar<PtrData> {
// Some methods will require this `PtrProvenance` trait.
}
trait PtrProvenance {
/// Whether with this provenance, it is legal to access the `raw` part of a `Scalar` directly.
const ALLOW_RAW_ACCESS: bool;
}
trait Machine {
type PtrData: PtrProvenance; // replaces `Tag`
fn get_ptr_alloc_offset(&InterpCx, ptr: Scalar<PtrData>) -> InterpResult<'_, (AllocId, Size)>;
} This way, at least there is no way for anything to use the If we wanted to be really sure about that, we would have to put the trait ScalarT: Copy {
fn from_int(...) -> Self;
fn from_uint(...) -> Self;
fn to_bits(self) -> InterpResult<'_, u128>;
}
trait Machine {
type Scalar: ScalarT;
fn get_ptr_alloc_offset(&InterpCx, ptr: Scalar) -> InterpResult<'_, (AllocId, Size)>;
} That seems extreme. But maybe it actually helps? EDIT: Ah, but |
That's what we do today. We have an enum around the two cases. And I think we should keep it (for the memory size benefits). You can just wrap it in a newtype and add the accessors on that. Best of both worlds. What will ctfe do with |
That might be the conclusion here, yes. The
My thinking was it would store the offset in the bits. We don't want anything resembling ptr-to-int casts in CTFE. So yes there would be a difference between CTFE and Miri, that's why I proposed some trait stuff above. |
My hope was we could first talk conceptual and then see how we can get the memory size optimized again, but maybe that doesn't work so well, so here's a strawman proposal: Instead of /// The fact that these are two variants should mostly not be visible to the outside.
/// Storing things into an `Allocation` should be the only time this abstraction is ever
/// "lifted" -- that's both in the `rustc::mir::interpret` module so we can even help a bit with privacy!
enum ScalarPriv<PtrData> {
Raw { /* like now */ },
Ptr { addr: u64, data: PtrData },
}
/// We want to keep the ScalarPriv constructors are private, so we need to wrap this.
pub struct Scalar<PtrData>(ScalarPriv<PtrData>);
trait PtrToInt {
/// Says whether the `addr` field of `ScalarPriv::Ptr` can be interpreted as raw bytes.
const ADDR_IS_RAW: bool;
}
// Some `Scalar` methods need that trait.
impl<Ptr: PtrToInt> Scalar<Ptr> {
// Internal helper methods:
fn get_size(self, cx: &impl HasDataLayout) -> Size {
let this = self.0;
match this {
Raw { size, data } => Size::from(size),
Ptr { .. } => cx.pointer_size(),
}
}
fn try_get_bits(self) -> Option<u128> {
let this = self.0;
match this {
Raw { size, data } => Some(data),
Ptr { addr, data } => if PtrToInt::ADDR_IS_RAW { Some(addr as u128) } else { None },
}
}
}
trait Machine {
/// Extra data carried with each pointer.
/// This type also defines how the `addr` field in `ScalarPriv::Ptr` is being interpreted!
type PtrData: PtrToInt + Copy;
/// "int to ptr" cast.
/// This forms the basis for a "normalization" method that adds `PtrData` to a `Scalar` that doesn't have it yet.
/// Note that this method cannot fail!
fn get_ptr_data(mem: &Memory<'mir, 'tcx, Self>, addr: u64) -> PtrData;
/// Get the alloc-offset pair from a `Scalar::Ptr`.
fn get_alloc_offset(mem: &Memory<'mir, 'tcx, Self>, addr: u64, data: PtrData) -> InterpResult<'tcx, (AllocId, Size)>;
} CTFE's Miri's Compared to today, (a) ptr-to-int-casts are "pure" and don't need machine access, and (b) Does this make sense? |
We could also add a OTOH, requiring that for ptr-to-scalar conversion might also be annoying. Not sure which is worse. |
I don't like having a size field on the pointer. I mean it only exists on Scalar as a sanity assert that I don't think we've triggered in a long time. All the ops are type based, so we can remove that field. |
You mean in about one hour? rust-random/getrandom#73 was found by that assert. ;) I am strongly in favor of keeping the field. I don't feel I could confidently work on Miri without this check. |
We can make it a field that only exists in debug mode, but yea i remember triggering asserts during development. General note: create a special ConstValue specific type that is not the |
Given that the field does not increase the size of the enum, I think making it debug-only would mostly manage to increase the messiness of the code. We could make the actual size checks debug-only if you want, but I'd prefer only doing this if that actually gives a measurable performance increase. I doubt it will.
This sounds reasonable (then we could also finally move It looks like the main representation change for this issue is to replace |
I don't think duplicating would be much of a pain. We don't use any of the methods on scalars if obtained from a ScalarValue except for getting an int out of it |
CTFE core engine allocation & memory API improvemenets This is a first step towards rust-lang/miri#841. - make `Allocation` API offset-based (no more making up `Pointer`s just to access an `Allocation`) - make `Memory` API higher-level (combine checking for access and getting access into one operation) The Miri-side PR is at rust-lang/miri#1804. r? `@oli-obk`
Currently,
force_bits
is a total operation in the sense that it will never go wrong in Miri (it can fail in CTFE though), butforce_ptr
can fail. That is very annoying, it means that we have to callforce_ptr
at just the right spot or we'll either causes errors in well-behaved code by converting too early, or re-do the conversion all the time because we do it too late. (See rust-lang/rust#62441 for some very carefulforce_ptr
placement.)I think it might be a good idea to make
force_ptr
never fail. The way this could be done is by designating a particularAllocId
as representing the "integer allocation" and having base address 0, so that a pointer with offseto
is equal to the integero
. This would mean that every pointer-sized value has two equivalent representations (returned byforce_bits
andforce_ptr
, respectively) -- which is maybe suboptimal, but IMO it is better than the current situation where only some pointer-sized values have two representations.Potentially, that would also let us use
Pointer
instead ofScalar
inMemPlace
, simplifying (I think) lots of code. However, that would mean even CTFE has two different representations for the same value. But given that the engine will have to handle that anyway, I am not sure if it is a problem.@oli-obk will recognize this as basically bringing back the "ZST allocation" or whatever it used to be called. It was a mess back then, but I think removing it only helped because it meant we had a canonical representation for every value -- a property we already lost with intptrcast.
What do you think?
Current proposal: #841 (comment)
The text was updated successfully, but these errors were encountered: