Unsafe Indexing and Slicing #392

arielb1 · 2014-10-13T19:03:48Z

aturon · 2014-10-13T19:07:26Z

aturon · 2014-10-13T19:32:25Z

I really like this proposal: it neatly solves some of the API consistency and ergonomics concerns around some very common uses of unchecked/unsafe code: skipping bounds checks. I see very little downside here.

I am slightly hesitant about not requiring this to be done in an unsafe context, though that's largely a matter of style since the unsafe keyword is used regardless. (The motivation for requiring an unsafe context is to more clearly mark the boundary of trust, e.g. earlier checking of lengths.)

As an aside, this proposal doesn't quite support the ergonomics @gankro wanted for his raw reform RFC, since when working with raw slices you'd still be required to write unsafe while indexing. We should definite consider these proposals as a pair and make sure there's a coherent overall design.

Gankra · 2014-10-13T20:11:57Z

This is a really cool idea!

As a possible way to bridge some of the ergonomics concerns posed by @aturon perhaps we consider the following rules (replace Slice and UnsafeSlice with your operator of choice):

If an object only implements Slice or UnsafeSlice, then the operator is safe or unsafe, and no infix notation needs to be provided.
If an object implements both then it must be disambiguated with the infix unsafe.

I'm ambivalent about making unsafe indexing "safe". The strikes me as arguing slice.unsafe_get should be safe because it has unsafe in the name. I sympathize with the refactoring usecase, though. I think it's more important to force to programmer to stop and contemplate the safety boundaries.

I wonder if we should consider introducing guidelines (if we haven't already?) about where to precisely delimit safety. For instance, given code like the following, should we prefer:

  // Every unsafe operation is totally legit in here because of preconditions or whatever
  unsafe {
    a = foo[1, ..5];
    b = bar.do_unsafe_stuff();
    c = *baz;
...
  }

Or:

  // Every unsafe operation is totally legit in here because of preconditions or whatever
    a = unsafe { foo[1, ..5] };
    b = unsafe { bar.do_unsafe_stuff() };
    c = unsafe { *baz };
...

What are we trying to express here? To me, there's the possible suggestion that all of the former must be refactored/rearranged with care, while the latter suggests that the operations are all totally independent, and can be rearranged freely without concern for the others.

I'm also thinking writing all of unsafe in the operators is a bit unfortunate. It's verbose and noisy. I think I'm going to have to live with that, though. There's no way we're adding an unsafe sigil, and something like uns will be more confusing than helpful, I bet.

arielb1 · 2014-10-13T20:20:50Z

The duplicate-unsafe looks very ugly to me, and unchecked indexing is a such a distinct class of unsafety that wrapping it in an unsafe block would encourage the use of additional unsafe operations (e.g. the original insertion sort working with raw slices).

Also, people would probably like to add unsafe marks to places when bound-checking costs them too much (after being sufficiently sure that the bound-checking is unneeded). Thinking about it, adding debug_assert!-s to unchecked indexing would be nice. I think the unsafe mark has
the right amount of scariness.

@gankro

Not providing the unsafe mark when the type doesn't implement Slice: I like that. Of course you won't get the free unsafe block then. Actually, thinking about it, implementing UnsafeIndex(Mut)? for *T doesn't seem so bad – it still won't let you return a dangling pointer, and a[0] won't accidentally destroy parts of a (which is what I've feared initially).

reem · 2014-10-13T21:48:17Z

active/0000-unsafe-indexing.md

+
+In addition, the syntax would be slightly ambigious with non-block
+unsafe expressions if they are ever introduced
+(```a[unsafe ptr::read(x)]```). Giving this syntax precedence seems


Would this be replaced with unsafe unsafe ptr::read(x), because that seems strange. Considering the relative likelihood of unsafe expressions being added in the future, we should have a plan to deal with this.

@reem

This can be replaced by unsafe a[unsafe ptr::read(x)].

I'm not convinced it can/I don't see how that syntax resolves the ambiguity. The unsafe before a allows additional unsafe operations inside the brackets? That seems extremely weird to me.

unsafe a[unsafe ptr::read(x)] would be equivalent (given unsafe expressions) to this RFC's unsafe { a[unsafe ptr::read(x)] }, with the ptr::read being allowed by the unsafe block/expression

nielsle · 2014-10-14T06:53:11Z

It would be nice to be able to add #[unchecked_slicing] as an attribute to an entire unsafe-block. Then you could switch back and forth with minimal effort. Perhaps the attribute could even be enabled and disabled by a configuration option.

#[unchecked_slicing]  
unsafe {
    let first = s.read(len-1);
    s[1..].copy(s[ ..len-1]);
    s.write(0, first);
}

EDIT: This idea is already mentioned under alternatives

glaebhoerl · 2014-10-14T11:43:40Z

My only concern is whether this might lead to a culture where people add unsafe to their indexing operations "to make it go faster", it being known that this lets you avoid the overhead of bounds checking (people really, really fetishize avoiding overhead), without fully appreciating the implications and the reason why Rust does checking. Saying "only use unsafe indexing if you are sure that the index is in-bounds" doesn't really cut it, because people are always sure that their code is right, right up until it turns out that it isn't.

(A full unsafe { ... } block with explicit unchecked_index calls feels like it entails a fuller ackowledgement that "yes, I want to do unsafe things here", as opposed to just "I want to skip the bounds check and don't care".)

Again, just to be clear: this is only a concern, and not necessarily an objection.

CloudiDust · 2014-10-14T14:42:47Z

@glaebhoerl, I think as long as we are not doing implicit indexing/slicing without bound checks, we'll be fine. s[unsafe n] does have a very clear unsafe inside, and this is enough warning in my eyes.

If a programmer throws unsafe everywhere carelessly, then he/she shouldn't be surprised when segfaults strike. ;)

SiegeLord · 2014-10-15T17:15:32Z

I don't like this because of the drawbacks mentioned. It especially bothers me that this approch is completely un-extensible (what about unsafe versions of other operators?).

A much better and extensible approach would be to create ad-hoc operator overloads, with an unsafe variant of indexing implemented on an adapter type. The syntax would be:

let vec_raw = vec.into_raw(); // or some other name
let val = unsafe{ vec_raw[idx] };

If game developers balk at putting unsafe everywhere, Rust has a wonderful macro functionality that can shorten it to a single letter, sigil and block.

arielb1 · 2014-10-16T09:57:06Z

@SiegeLord

Your proposal ends up with forcing people to use .into_raw(), which loses the aliasing and validity guarantees that &mut [T] provides.

SiegeLord · 2014-10-16T13:38:01Z

Maybe .into_raw isn't the best implementation of that idea (I wasn't aware that it loses the lifetime). Something better behaved would work better:

struct UnsafeMutAccessor<'a, T:'a> {
    slice: &'a mut [T]
}

// impl the unsafe indexing traits on UnsafeMutAccessor

trait UnsafeMutAccess<T> {
    fn as_unsafe_mut<'a>(&'a mut self) -> UnsafeMutAccessor<'a, T>;
}

impl<'b, T> UnsafeMutAccess<T> for &'b mut [T] {
    fn as_unsafe_mut<'a>(&'a mut self) -> UnsafeMutAccessor<'a, T> {
        UnsafeMutAccessor{ slice: *self }
    }
}

impl<'b, T> UnsafeMutAccess<T> for Vec<T> {
    fn as_unsafe_mut<'a>(&'a mut self) -> UnsafeMutAccessor<'a, T> {
        UnsafeMutAccessor{ slice: self.as_mut_slice() }
    }
}

Gankra · 2014-10-17T21:51:32Z

I assumed @SiegeLord was proposing that intermediary representation all along. It's something I've been mulling over since the no-lifetime issue was noted on my RFC. It's definitely worth seriously considering, although it seems unfortunate to have all of these "modes" of slice. Maybe slices just are important enough to justify it, though.

Note that "coerce into a slice to get unsafe indexing" doesn't work on e.g. RingBuf, which can't be represented (in general) as a single slice. Maybe if it had a method that returns two slices?

Diggsey · 2014-10-27T22:58:01Z

This doesn't seem the best way to solve the common case where the programmer wants to avoid the overhead of bounds checking:

It's not configurable: for example, I might still want proper bounds checking when debugging
It's an impossible decision to make well: the programmer can only reliably know when they SHOULD do bounds checking, but without a formal proof, can't be sure when they needn't, and the latter case is far more common.
Changes to safe code can break the invariants assumed by unsafe code even if the two pieces of code are not directly related.
Many small changes are necessary throughout the code for an algorithm, to either add or remove bounds checking, which is both time consuming and error-prone, when 99% of the time the desired behaviour is "enable/disable bounds checking when indexing into variable X" which can be done much more expressively with an attribute on the variable.

A "perfect" solution would be a way to specify intent to the compiler, eg. "unsafe constraint(i >= 0 && i < length)", and then have it automatically omit bounds checking where the constraint implies the bound check would succeed. Eventually the compiler could even do static analysis to try to determine if the constraint holds, which would allow safe constraints, but that is obviously very difficult...

An imperfect but more practical solution would be to have an unsafe attribute which when applied to a variable declaration causes all indexes into that variable to omit bounds checking. This attribute would only have an effect when triggered with a particular compiler configuration, so that boundary checks are only omitted when it's absolutely necessary to get the best possible performance (for example in production builds)

arielb1 · 2014-10-28T07:21:07Z

@Diggsey

Out-of-bounds access is still UB – so we can have checking in debug builds. I think checking accesses, not variables, is the right choice, because it is accesses that take time, not variables.

Eliding bounds checking which we can prove we don't need is unrelated.

Diggsey · 2014-10-28T14:00:05Z

"Out-of-bounds access is still UB – so we can have checking in debug builds."
Under this proposal there are two separate methods for safe/unsafe indexing, and the unsafe keyword forces the compiler to use the unsafe version, UB only comes into play once the unsafe version is used AND the index is out of bounds. There's no way to get it to use the safe versions through a configuration option. Also, tying it to debug builds is not great, it should be separately configurable: it's not uncommon for errors to only show in one configuration, for example I might get access violations in a release build, but no problems in debug. If I can temporarily enable bounds checking in release mode it tells me immediately whether the access violation was caused by indexing, and if so, where the problem is occurring.

"Eliding bounds checking which we can prove we don't need is unrelated."
As purely an optimisation, yes, but:

If the programmer wants to omit bounds checking, they are presumably fairly sure that certain constraints hold (namely that the indexer is always within the bounds)
The logic behind this, if specified somehow in code, could be checked for correctness by the compiler.
This allows the programmer to explicitly omit bounds checking without falling back to unsafe code, and getting an error at compile time if something changes to invalidate their reasoning.
It's not hugely different to how the programmer currently specifies lifetimes, and then the compiler proves that they are valid.

arielb1 · 2014-10-28T16:49:41Z

See #423 for handling debugging and OOB indexing

aturon · 2014-11-03T18:52:09Z

@arielb1

The Rust team has spent some time discussing this issue (and related RFCs) and, while we agree that the ergonomics of unsafe programming can and should improve over time, we're not ready to commit to a design yet. For the 1.0 release, we plan to work on shoring up the existing library APIs for unsafe code, making them coherent and ergonomic, but are not ready to make language-level commitments.

I'm going to close this as Postponed for now, and create a tracking issue in this repo so that we can revisit it later on.

Thanks for the effort in writing this up!

aturon · 2014-11-03T18:56:48Z

Tracking issue here.

Deprecate setupComponentManager String Lookup

arielb1 force-pushed the unsafe-indexing branch 2 times, most recently from db1a35c to 955ead9 Compare October 13, 2014 19:06

Unsafe Indexing and Slicing

955ead9

arielb1 closed this Oct 13, 2014

arielb1 reopened this Oct 13, 2014

reem reviewed Oct 13, 2014
View reviewed changes

SiegeLord mentioned this pull request Oct 15, 2014

Ad-hoc operator overloading #399

Closed

nrc assigned aturon Oct 16, 2014

Diggsey mentioned this pull request Oct 28, 2014

Flags for unspecified behaviour #423

Closed

alexcrichton force-pushed the master branch from b9e2b8c to 5020131 Compare October 29, 2014 15:25

aturon closed this Nov 3, 2014

aturon added the postponed RFCs that have been postponed and may be revisited at a later time. label Nov 3, 2014

aturon mentioned this pull request Nov 3, 2014

Improve unsafe ergonomics #433

Open

wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019

Merge pull request rust-lang#392 from chadhietala/deprecate-stringz

2afd501

Deprecate setupComponentManager String Lookup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsafe Indexing and Slicing #392

Unsafe Indexing and Slicing #392

arielb1 commented Oct 13, 2014

aturon commented Oct 13, 2014

aturon commented Oct 13, 2014

Gankra commented Oct 13, 2014

arielb1 commented Oct 13, 2014

reem Oct 13, 2014

arielb1 Oct 14, 2014

reem Oct 14, 2014

arielb1 Oct 14, 2014

nielsle commented Oct 14, 2014

glaebhoerl commented Oct 14, 2014

CloudiDust commented Oct 14, 2014

SiegeLord commented Oct 15, 2014

arielb1 commented Oct 16, 2014

SiegeLord commented Oct 16, 2014

Gankra commented Oct 17, 2014

Diggsey commented Oct 27, 2014

arielb1 commented Oct 28, 2014

Diggsey commented Oct 28, 2014

arielb1 commented Oct 28, 2014

aturon commented Nov 3, 2014

aturon commented Nov 3, 2014

Unsafe Indexing and Slicing #392

Unsafe Indexing and Slicing #392

Conversation

arielb1 commented Oct 13, 2014

aturon commented Oct 13, 2014

aturon commented Oct 13, 2014

Gankra commented Oct 13, 2014

arielb1 commented Oct 13, 2014

reem Oct 13, 2014

Choose a reason for hiding this comment

arielb1 Oct 14, 2014

Choose a reason for hiding this comment

reem Oct 14, 2014

Choose a reason for hiding this comment

arielb1 Oct 14, 2014

Choose a reason for hiding this comment

nielsle commented Oct 14, 2014

glaebhoerl commented Oct 14, 2014

CloudiDust commented Oct 14, 2014

SiegeLord commented Oct 15, 2014

arielb1 commented Oct 16, 2014

SiegeLord commented Oct 16, 2014

Gankra commented Oct 17, 2014

Diggsey commented Oct 27, 2014

arielb1 commented Oct 28, 2014

Diggsey commented Oct 28, 2014

arielb1 commented Oct 28, 2014

aturon commented Nov 3, 2014

aturon commented Nov 3, 2014