Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexed and IndexedMut traits only accept usize #29010

Closed
peter-bertok opened this issue Oct 13, 2015 · 6 comments
Closed

Indexed and IndexedMut traits only accept usize #29010

peter-bertok opened this issue Oct 13, 2015 · 6 comments

Comments

@peter-bertok
Copy link

Sample code: https://play.rust-lang.org/?gist=b9b0fad33060c74095a0&version=nightly

Short sample code that fails to compile:

let a = [0;5];
let x = a[0u8];

I'm working on a native Rust implementation of the Google Brotli compression algorithm (https://github.com/peter-bertok/brotli), which has a lot of low-level bit/byte twiddling and came across this issue.

  • Array indexing using arbitrary integral types works in literally every other mainstream language I can think of. Many other languages allow non-integral types as well, such as their equivalent of repr(C) enums.
  • Many unnecessary explicit type casts are required in certain types of code (binary format parsing, native interop, etc...)
  • An implicit cast from small unsigned integral types is never harmful. That is: u8, u16, and u32.
  • Casts from other types can be handled with a new warning or the existing error, with an explicit cast required to ensure the developer understands that truncation/wrapping is possible.
  • Alternatively, the bounds checking in Rust can be relied upon to provide protection. E.g.: indexing with a negative value should always cause a panic. In fact, this can be made SAFER than an explicit cast. Think of the scenario where a large negative signed number is explicitly cast by the programmer to usize, like they have to do now. This would wrap to a small positive number, which is a valid index. If the rust compiler provides an implementation for Indexed<> with signed types, it can insert a check for negative numbers, improving safety substantially. Currently, this compiles and runs without a panic:
let offs: i64 = -18446744073709551611;
let a = [0;50];
let x = a[offs as usize];
  • This issue also causes the compiler to infer the usize type unexpectedly, which is confusing to developers. Goes against the rule of least surprise. This fails to compile unexpectedly:
let offs: u32 = 1; // explicit integral type. Could be a function parameter, struct member, etc...
let mut i = 0; // due to slice indexing, this is inferred to be `usize`
let a = [0;5]; // any slice of any type, doesn't matter
let x = a[i]; 
// This results in: "the trait `core::ops::Add<u32>` is not implemented for the type `usize`"
let y = a[i+offs]; 

Proposal

  • Implement the Indexed and IndexedMut traits for u8, u16, and u32 for slice and Vec types with identical behaviour to the usize version.
  • Implement the Indexed and IndexedMut traits for isize, i8, i16, and i32 for slice and Vec types with new behaviour that panics on negative values, not just out-of-bounds positive values.
  • The i64 and u64 implementations are problematic on 32-bit builds. How these are treated is up for debate.
@petrochenkov
Copy link
Contributor

There's a PR implementing conversion traits Into/From for primitive integer types.
When it lands, I think it will be reasonable to implement Index<T>/IndexMut<T> where T: Into<usize> for vectors, arrays, etc., covering the first part of your proposal, it will need some wider consensus though, many people don't like implicit conversions like these.

There's also an RFC proposing checked integer conversions, panicking on truncation. With these conversions indexing with signed types will look like v[index.cast()] making the possible panic more visible (I'd personally prefer it to be visible). This covers the second and the third parts of your proposal.

Some older discussion on this topic:
https://internals.rust-lang.org/t/implicit-widening-polymorphic-indexing-and-similar-ideas

@peter-bertok
Copy link
Author

Making the panic for the signed integer case "visible" would be inconsistent with the current behaviour for out-of-bounds large positive numbers...

That is, both of these cases should "work" in the sense that both should panic, for the same reason and in the same way:

let i: isize = -5;
let j: usize =  5;
let x = [0; 2];
let a = x[i]; // panic (out-of-bounds low)
let a = x[j]; // panic (out-of-bounds high)

Now that I think about it, the i64 and u64 cases aren't "special" either, as long as the bounds checking is relied upon! The PR you mentioned however could result in inconsistent behaviour (depending on implementation details) which goes against the principle of least surprise.

For example, consider the case of indexing into an array using 'large' u64 numbers. The exact type of panic would be inconsistent between 32-bit and 64-bit.

E.g.:

let i: u64 = {larger than array bounds, representable within 32 bits};
let j: u64 = {too large to represent within 32 bits};
let x = [0; ...];
let a = x[i]; // slice out-of-bounds panic on any platform
let b = x[j]; // slice out-of-bounds panic on 64-bit, integer truncation on 32-bit

This could cause issues for programmers relying on catch_panic.

Ideally, any integer type should be usable for indexing, and the behaviour should always be consistent: allow the lookup if within bounds, panic with an out-of-bounds error otherwise (not an integer truncation error).

@petrochenkov
Copy link
Contributor

@peter-bertok
I'm not especially eager to argue here, because I'd prefer indexing with everything except for usize, and especially with signed types, to be a compile time error, like now.
Nonetheless, discerning between the kinds of produced panic or relying on them seems very strange. Internally Index will have to cast the argument to usize anyway, so the panic won't necessary be an "out of bounds panic" in all cases.

@peter-bertok
Copy link
Author

@petrochenkov
It's strange now, but the reality is that the try-catch (panic-catch in Rust terms) paradigm is virtually unavoidable in the long term. How else do you propose developers handle out-of-bounds errors without doubling up on bounds checking? The compiler does it anyway, so why not catch the panic and handle the error, instead of just blowing up the thread and/or program?

This is a common pattern in other safe languages, such as C# or Java:

try {
   ... complex array code for, say, decoding a packet...
} catch( IndexOutOfRangeException )
{
    return ... "invalid data format"...;
}

This is almost possible with (nightly) Rust, with the only minor issue that panic isn't strongly typed:

#![feature(catch_panic)]
fn test( i: usize ) -> i32 {
    let data = [0i32;10];
    data[i]
}

fn main() {
    match std::thread::catch_panic( || test( 50 ) ) {
        Ok(n) => println!( "{:?}", n ),
        Err(e) => println!(  "{:?}", e )
    }
}

Unless I'm missing something, the current design for error handling in Rust has a number of gaps:

  • To handle panics without crashing the whole program, dedicated threads are required (stable Rust).
  • Even with the nightly builds, there is no way to differentiate between different types of panic!(), as it returns untyped values (std::any::Any).
  • The only option available to developers is to double-up on certain checks, such as indexing bounds checks, which is inefficient at best, and makes code much less readable.
  • Panics often return a "string", using panic_fmt!(). This means that error messages generated by the compiler will not be internationalisable. Compare with Java/C# Exception-s. It's trivial to do a "switch" on the exception type, or have catch(...) blocks return distinct i18n formatted message for each possible exception type. Not everything should be represented by an English string, particulary in a general purpose systems programming language!

I'm of the opinion that this boils down to language design philosophy. Java, C#, C++, and Go all made the mistake of pretending that templates are an optional feature, then tacked them on too late (or not at all), resulting in poorly designed standard libraries, often with two versions of the same interface -- templated and legacy. Those language designs contained a huge mistake that can never be corrected in a backwards-compatible fashion.

Thankfully, Rust implemented templates right from the beginning, resulting in a much more elegant standard library. Good!

Unfortunately, the Rust designers made the mistake of pretending that exceptions are optional. They are not. Exception handling is vital to safe, readable, and performant code. Right now, developers have to sacrifice at least some of those features if they want to program in Rust because neither "Error" nor "panic!()" handle the "Exception" case well enough.

Error handling with complex return types and try!() has an overhead and results in messy code, while panic just kills the program, which is a potential denial of service.

How is Servo expected to decode complex binary protocols safely, such as compression formats, without killing the browser when receiving specially crafted malicious input? The compiler-generated panics will kill the thread! So should any code that can potentially cause array-out-of-bounds be run in dedicated threads!? Or should the Servo developers just use catch_panic? Then what? What error can they report apart from a meaningless "it failed" OR a meaningful but english only message?

Ideally, I'd love to see Rust embrace Scoped Continuations, which is a language design that is a superset of Monads and Try-Catch exception handling. It would make it possible to write type-safe error handling that is much more powerful than the current Monadic-style that is better suited to pure functional languages.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 13, 2015

I think you are steering off course of this issue. This issue is about the arguments the Index types take. If you want to discuss exception handling I suggest you join the discussion: rust-lang/rfcs#243

Error handling with complex return types and try!() has an overhead

Please provide measurable evidence ;)

and results in messy code,

Have you seen Java Code handling exceptions?

while panic just kills the program, which is a potential denial of service.

The point of panic is that it should never occur. Any time you index into an array you are "guaranteeing" that the index is in the range. If it's not, it's a bug in your code. If you want to check and change behavior depending on whether an index is in range, add a check or use a method like get.

The compiler does it anyway,

Yes it does, and if you also do it, one of the two checks gets optimized out. Often times both get optimized out if llvm figures out that neither check is necessary.

@Aatch
Copy link
Contributor

Aatch commented Oct 27, 2015

Regardless of opinions, this change would require an RFC. Feel free to open an issue (or better yet, an actual RFC) over on https://github.com/rust-lang/rfcs if you want to continue this discussion.

@Aatch Aatch closed this as completed Oct 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants