Can we weaken the requirements for `offset`? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) #350

RalfJung · 2022-07-11T17:14:48Z

The "inbounds" semantics of offset are notoriously tricky and confusing. From what I hear from @nikic, the "inbounds" part of them is also not nearly as useful as one might think, and the main payoff is being sure that the pointer is not wrapped around either end of the address space.

So... is there a chance that we could significantly simplify the language at acceptable cost for analyses by changing the rules of offset (and all other "inbounds" offsets that the language does implicitly, like when applying place projections) such that the only case of UB here is overflow wrapping around the address space (both below 0 and above usize::MAX)? I think that would be great, but of course we have to be careful not to give up too much information here. (That said, we do have a ton of information of the form "this pointer is dereferenceable for size N", which conveys bounds information much more directly than getelementptr inbounds.)

However, we'd probably need LLVM support for this, adding some sort of getelementptr nowrap. (There is the possible alternative of using plain getelementptr, and upgrading that to inbounds whenever we can derive from other information that the pointer is indeed dereferenceable for a sufficiently large memory range. I am not sure how tricky that would be to implement though.)

So I wonder, @nikic, do you think that would be a reasonable and realistic option? And everyone, do you think that would be a reasonable semantics to shoot for?

In particular, this would resolve #299.

The text was updated successfully, but these errors were encountered:

Lokathor · 2022-07-11T17:19:52Z

Just saying "you can't wrap the address space" is extremely teachable.

scottmcm · 2022-07-11T19:05:13Z

I always thought this was as much about aliasing information as anything. Is GEPi not actually important for that because we get all the information we need about it from provenance anyway?

Lokathor · 2022-07-11T19:17:15Z

Yeah two pointers that each point to a separate stack object still can't index oob and access each other because they'd have separate provenance.

eddyb · 2022-07-17T23:12:20Z

If GEP was replaced by an untyped "pointer offset with integer dot product" operation (i.e. a sequence of constant strides and runtime indices, to cover the "nested arrays" case, if they do want to keep it as one instruction), it would be great to reuse nuw/nsw flags from arithmetic operations.

(NUW/NSW standing for "No Unsigned/Signed Wrap", i.e. opting into to unsigned/signed overflow being UB)

Alternatively, if LLVM had add nuw/sub nuw between ptr and an integer, that could allow for add/sub methods on Rust pointers that aren't limited by isize like offset is (though I'm not sure we want to even dream about such things, given how much legacy there is around the whole "ptrdiff_t is one bit too small" issue).

RalfJung · 2023-02-10T12:57:38Z

Even on the LLVM side there seems to be some desire for a getelementptr nowrap, though it seems like that proposal has not moved in a while.

RalfJung · 2023-06-14T10:05:53Z

I learned in the mean time that getelementptr inbounds in LLVM doesn't actually require the allocation to be live. In my interpretation of Rust's rules (and in Miri's implementation), we do require liveness. So we could weaken our offset rules a bit by dropping the liveness requirement. But honestly this doesn't solve the main pain points here so I don't think it's worth it.

RalfJung · 2023-06-28T14:38:44Z

We had a t-opsem meeting on this subject; see rust-lang/opsem-team#10 for a summary:

Summary of the discussion:

Connor brings up a potential optimization that Rust's offset (but not LLVM's getelementptr inbounds) permits: deducing that the range between old and new pointer is dereferenceable, and using that to prefetch memory.

The counterpoint is that in particular for place projections, people often don't expect the inbounds requirement, and accidentally cause UB. For offset this is less common due to the detailed docs, but we'd rather not have place projection be a third offset operation distinct from wrapping_offset and offset.

The counter-counterpoint is that this mostly came up when people tried to hand-write offset_of, and hopefully that won't happen any more when offset_of is natively supported.

Overall we have fairly concrete evidence of people not expecting the UB, but only in code that hopefully won't be written any more, vs a very tentative idea that this could be useful for optimizations.

The general conclusion was that for now it doesn't seem clear that dropping the inbounds is the best call. There was consensus that offset(0) should be allowed always, so we will try to see if LLVM can commit to supporting this. Also we should keep our eyes open for real-world UB caused by the inbounds requirement that does not fall in the category of "hand-rolled offset_of implementation".

RalfJung mentioned this issue Jul 15, 2022

Should raw pointer deref/projections have to be in-bounds? #319

Closed

This was referenced Jun 14, 2023

Meeting proposal: inbounds requirements for offset (and potentially place projections) rust-lang/opsem-team#10

Closed

Validity invariant of places #418

Open

RalfJung changed the title ~~Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?~~ Can we weaken the requirements for offset? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) Jun 14, 2023

RalfJung mentioned this issue Jun 14, 2023

Miri allows inbounds offsets outside of provenance but inside of the allocid #349

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we weaken the requirements for `offset`? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) #350

Can we weaken the requirements for `offset`? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) #350

RalfJung commented Jul 11, 2022 •

edited

Loading

Lokathor commented Jul 11, 2022

scottmcm commented Jul 11, 2022

Lokathor commented Jul 11, 2022

eddyb commented Jul 17, 2022

RalfJung commented Feb 10, 2023

RalfJung commented Jun 14, 2023

RalfJung commented Jun 28, 2023 •

edited

Loading

Can we weaken the requirements for offset? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) #350

Can we weaken the requirements for offset? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) #350

Comments

RalfJung commented Jul 11, 2022 • edited Loading

Lokathor commented Jul 11, 2022

scottmcm commented Jul 11, 2022

Lokathor commented Jul 11, 2022

eddyb commented Jul 17, 2022

RalfJung commented Feb 10, 2023

RalfJung commented Jun 14, 2023

RalfJung commented Jun 28, 2023 • edited Loading

Can we weaken the requirements for `offset`? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) #350

Can we weaken the requirements for `offset`? (Was: Should we / can we make all "getelementptr inbounds" into "getelementptr nowrap"?) #350

RalfJung commented Jul 11, 2022 •

edited

Loading

RalfJung commented Jun 28, 2023 •

edited

Loading