Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe Indexing and Slicing #392

Closed
wants to merge 1 commit into from

Conversation

arielb1
Copy link
Contributor

@arielb1 arielb1 commented Oct 13, 2014

@arielb1 arielb1 force-pushed the unsafe-indexing branch 2 times, most recently from db1a35c to 955ead9 Compare October 13, 2014 19:06
@aturon
Copy link
Member

aturon commented Oct 13, 2014

cc @gankro

@aturon
Copy link
Member

aturon commented Oct 13, 2014

I really like this proposal: it neatly solves some of the API consistency and ergonomics concerns around some very common uses of unchecked/unsafe code: skipping bounds checks. I see very little downside here.

I am slightly hesitant about not requiring this to be done in an unsafe context, though that's largely a matter of style since the unsafe keyword is used regardless. (The motivation for requiring an unsafe context is to more clearly mark the boundary of trust, e.g. earlier checking of lengths.)

As an aside, this proposal doesn't quite support the ergonomics @gankro wanted for his raw reform RFC, since when working with raw slices you'd still be required to write unsafe while indexing. We should definite consider these proposals as a pair and make sure there's a coherent overall design.

@Gankra
Copy link
Contributor

Gankra commented Oct 13, 2014

This is a really cool idea!

As a possible way to bridge some of the ergonomics concerns posed by @aturon perhaps we consider the following rules (replace Slice and UnsafeSlice with your operator of choice):

  • If an object only implements Slice or UnsafeSlice, then the operator is safe or unsafe, and no infix notation needs to be provided.
  • If an object implements both then it must be disambiguated with the infix unsafe.

I'm ambivalent about making unsafe indexing "safe". The strikes me as arguing slice.unsafe_get should be safe because it has unsafe in the name. I sympathize with the refactoring usecase, though. I think it's more important to force to programmer to stop and contemplate the safety boundaries.

I wonder if we should consider introducing guidelines (if we haven't already?) about where to precisely delimit safety. For instance, given code like the following, should we prefer:

  // Every unsafe operation is totally legit in here because of preconditions or whatever
  unsafe {
    a = foo[1, ..5];
    b = bar.do_unsafe_stuff();
    c = *baz;
...
  }

Or:

  // Every unsafe operation is totally legit in here because of preconditions or whatever
    a = unsafe { foo[1, ..5] };
    b = unsafe { bar.do_unsafe_stuff() };
    c = unsafe { *baz };
...

What are we trying to express here? To me, there's the possible suggestion that all of the former must be refactored/rearranged with care, while the latter suggests that the operations are all totally independent, and can be rearranged freely without concern for the others.

I'm also thinking writing all of unsafe in the operators is a bit unfortunate. It's verbose and noisy. I think I'm going to have to live with that, though. There's no way we're adding an unsafe sigil, and something like uns will be more confusing than helpful, I bet.

@arielb1
Copy link
Contributor Author

arielb1 commented Oct 13, 2014

The duplicate-unsafe looks very ugly to me, and unchecked indexing is a such a distinct class of unsafety that wrapping it in an unsafe block would encourage the use of additional unsafe operations (e.g. the original insertion sort working with raw slices).

Also, people would probably like to add unsafe marks to places when bound-checking costs them too much (after being sufficiently sure that the bound-checking is unneeded). Thinking about it, adding debug_assert!-s to unchecked indexing would be nice. I think the unsafe mark has
the right amount of scariness.

@gankro

Not providing the unsafe mark when the type doesn't implement Slice: I like that. Of course you won't get the free unsafe block then. Actually, thinking about it, implementing UnsafeIndex(Mut)? for *T doesn't seem so bad – it still won't let you return a dangling pointer, and a[0] won't accidentally destroy parts of a (which is what I've feared initially).

@arielb1 arielb1 closed this Oct 13, 2014
@arielb1 arielb1 reopened this Oct 13, 2014

In addition, the syntax would be slightly ambigious with non-block
unsafe expressions if they are ever introduced
(```a[unsafe ptr::read(x)]```). Giving this syntax precedence seems
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be replaced with unsafe unsafe ptr::read(x), because that seems strange. Considering the relative likelihood of unsafe expressions being added in the future, we should have a plan to deal with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reem

This can be replaced by unsafe a[unsafe ptr::read(x)].

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced it can/I don't see how that syntax resolves the ambiguity. The unsafe before a allows additional unsafe operations inside the brackets? That seems extremely weird to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe a[unsafe ptr::read(x)] would be equivalent (given unsafe expressions) to this RFC's unsafe { a[unsafe ptr::read(x)] }, with the ptr::read being allowed by the unsafe block/expression

@nielsle
Copy link

nielsle commented Oct 14, 2014

It would be nice to be able to add #[unchecked_slicing] as an attribute to an entire unsafe-block. Then you could switch back and forth with minimal effort. Perhaps the attribute could even be enabled and disabled by a configuration option.

#[unchecked_slicing]  
unsafe {
    let first = s.read(len-1);
    s[1..].copy(s[ ..len-1]);
    s.write(0, first);
}

EDIT: This idea is already mentioned under alternatives

@glaebhoerl
Copy link
Contributor

My only concern is whether this might lead to a culture where people add unsafe to their indexing operations "to make it go faster", it being known that this lets you avoid the overhead of bounds checking (people really, really fetishize avoiding overhead), without fully appreciating the implications and the reason why Rust does checking. Saying "only use unsafe indexing if you are sure that the index is in-bounds" doesn't really cut it, because people are always sure that their code is right, right up until it turns out that it isn't.

(A full unsafe { ... } block with explicit unchecked_index calls feels like it entails a fuller ackowledgement that "yes, I want to do unsafe things here", as opposed to just "I want to skip the bounds check and don't care".)

Again, just to be clear: this is only a concern, and not necessarily an objection.

@CloudiDust
Copy link
Contributor

@glaebhoerl, I think as long as we are not doing implicit indexing/slicing without bound checks, we'll be fine. s[unsafe n] does have a very clear unsafe inside, and this is enough warning in my eyes.

If a programmer throws unsafe everywhere carelessly, then he/she shouldn't be surprised when segfaults strike. ;)

@SiegeLord
Copy link

I don't like this because of the drawbacks mentioned. It especially bothers me that this approch is completely un-extensible (what about unsafe versions of other operators?).

A much better and extensible approach would be to create ad-hoc operator overloads, with an unsafe variant of indexing implemented on an adapter type. The syntax would be:

let vec_raw = vec.into_raw(); // or some other name
let val = unsafe{ vec_raw[idx] };

If game developers balk at putting unsafe everywhere, Rust has a wonderful macro functionality that can shorten it to a single letter, sigil and block.

@arielb1
Copy link
Contributor Author

arielb1 commented Oct 16, 2014

@SiegeLord

Your proposal ends up with forcing people to use .into_raw(), which loses the aliasing and validity guarantees that &mut [T] provides.

@SiegeLord
Copy link

Maybe .into_raw isn't the best implementation of that idea (I wasn't aware that it loses the lifetime). Something better behaved would work better:

struct UnsafeMutAccessor<'a, T:'a> {
    slice: &'a mut [T]
}

// impl the unsafe indexing traits on UnsafeMutAccessor

trait UnsafeMutAccess<T> {
    fn as_unsafe_mut<'a>(&'a mut self) -> UnsafeMutAccessor<'a, T>;
}

impl<'b, T> UnsafeMutAccess<T> for &'b mut [T] {
    fn as_unsafe_mut<'a>(&'a mut self) -> UnsafeMutAccessor<'a, T> {
        UnsafeMutAccessor{ slice: *self }
    }
}

impl<'b, T> UnsafeMutAccess<T> for Vec<T> {
    fn as_unsafe_mut<'a>(&'a mut self) -> UnsafeMutAccessor<'a, T> {
        UnsafeMutAccessor{ slice: self.as_mut_slice() }
    }
}

@Gankra
Copy link
Contributor

Gankra commented Oct 17, 2014

I assumed @SiegeLord was proposing that intermediary representation all along. It's something I've been mulling over since the no-lifetime issue was noted on my RFC. It's definitely worth seriously considering, although it seems unfortunate to have all of these "modes" of slice. Maybe slices just are important enough to justify it, though.

Note that "coerce into a slice to get unsafe indexing" doesn't work on e.g. RingBuf, which can't be represented (in general) as a single slice. Maybe if it had a method that returns two slices?

@Diggsey
Copy link
Contributor

Diggsey commented Oct 27, 2014

This doesn't seem the best way to solve the common case where the programmer wants to avoid the overhead of bounds checking:

  • It's not configurable: for example, I might still want proper bounds checking when debugging
  • It's an impossible decision to make well: the programmer can only reliably know when they SHOULD do bounds checking, but without a formal proof, can't be sure when they needn't, and the latter case is far more common.
  • Changes to safe code can break the invariants assumed by unsafe code even if the two pieces of code are not directly related.
  • Many small changes are necessary throughout the code for an algorithm, to either add or remove bounds checking, which is both time consuming and error-prone, when 99% of the time the desired behaviour is "enable/disable bounds checking when indexing into variable X" which can be done much more expressively with an attribute on the variable.

A "perfect" solution would be a way to specify intent to the compiler, eg. "unsafe constraint(i >= 0 && i < length)", and then have it automatically omit bounds checking where the constraint implies the bound check would succeed. Eventually the compiler could even do static analysis to try to determine if the constraint holds, which would allow safe constraints, but that is obviously very difficult...

An imperfect but more practical solution would be to have an unsafe attribute which when applied to a variable declaration causes all indexes into that variable to omit bounds checking. This attribute would only have an effect when triggered with a particular compiler configuration, so that boundary checks are only omitted when it's absolutely necessary to get the best possible performance (for example in production builds)

@arielb1
Copy link
Contributor Author

arielb1 commented Oct 28, 2014

@Diggsey

Out-of-bounds access is still UB – so we can have checking in debug builds. I think checking accesses, not variables, is the right choice, because it is accesses that take time, not variables.

Eliding bounds checking which we can prove we don't need is unrelated.

@Diggsey
Copy link
Contributor

Diggsey commented Oct 28, 2014

"Out-of-bounds access is still UB – so we can have checking in debug builds."
Under this proposal there are two separate methods for safe/unsafe indexing, and the unsafe keyword forces the compiler to use the unsafe version, UB only comes into play once the unsafe version is used AND the index is out of bounds. There's no way to get it to use the safe versions through a configuration option. Also, tying it to debug builds is not great, it should be separately configurable: it's not uncommon for errors to only show in one configuration, for example I might get access violations in a release build, but no problems in debug. If I can temporarily enable bounds checking in release mode it tells me immediately whether the access violation was caused by indexing, and if so, where the problem is occurring.

"Eliding bounds checking which we can prove we don't need is unrelated."
As purely an optimisation, yes, but:

  • If the programmer wants to omit bounds checking, they are presumably fairly sure that certain constraints hold (namely that the indexer is always within the bounds)
  • The logic behind this, if specified somehow in code, could be checked for correctness by the compiler.
  • This allows the programmer to explicitly omit bounds checking without falling back to unsafe code, and getting an error at compile time if something changes to invalidate their reasoning.
    It's not hugely different to how the programmer currently specifies lifetimes, and then the compiler proves that they are valid.

@arielb1
Copy link
Contributor Author

arielb1 commented Oct 28, 2014

See #423 for handling debugging and OOB indexing

@aturon
Copy link
Member

aturon commented Nov 3, 2014

@arielb1

The Rust team has spent some time discussing this issue (and related RFCs) and, while we agree that the ergonomics of unsafe programming can and should improve over time, we're not ready to commit to a design yet. For the 1.0 release, we plan to work on shoring up the existing library APIs for unsafe code, making them coherent and ergonomic, but are not ready to make language-level commitments.

I'm going to close this as Postponed for now, and create a tracking issue in this repo so that we can revisit it later on.

Thanks for the effort in writing this up!

@aturon aturon closed this Nov 3, 2014
@aturon aturon added the postponed RFCs that have been postponed and may be revisited at a later time. label Nov 3, 2014
@aturon
Copy link
Member

aturon commented Nov 3, 2014

Tracking issue here.

wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
Deprecate setupComponentManager String Lookup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
postponed RFCs that have been postponed and may be revisited at a later time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants