Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slice::check_range #75207

Merged
merged 5 commits into from
Sep 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 13 additions & 26 deletions library/alloc/src/collections/vec_deque.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ use core::fmt;
use core::hash::{Hash, Hasher};
use core::iter::{once, repeat_with, FromIterator, FusedIterator};
use core::mem::{self, replace, ManuallyDrop};
use core::ops::Bound::{Excluded, Included, Unbounded};
use core::ops::{Index, IndexMut, RangeBounds, Try};
use core::ops::{Index, IndexMut, Range, RangeBounds, Try};
use core::ptr::{self, NonNull};
use core::slice;

Expand Down Expand Up @@ -1083,24 +1082,18 @@ impl<T> VecDeque<T> {
self.tail == self.head
}

fn range_start_end<R>(&self, range: R) -> (usize, usize)
fn range_tail_head<R>(&self, range: R) -> (usize, usize)
where
R: RangeBounds<usize>,
{
let len = self.len();
let start = match range.start_bound() {
Included(&n) => n,
Excluded(&n) => n + 1,
Unbounded => 0,
};
let end = match range.end_bound() {
Included(&n) => n + 1,
Excluded(&n) => n,
Unbounded => len,
};
assert!(start <= end, "lower bound was too large");
assert!(end <= len, "upper bound was too large");
(start, end)
// SAFETY: This buffer is only used to check the range. It might be partially
// uninitialized, but `check_range` needs a contiguous slice.
// https://github.com/rust-lang/rust/pull/75207#discussion_r471193682
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a partially uninitialized slice is UB. So I don't think the safety comment here is accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was something the standard library could assume. Wouldn't that make buffer_as_slice UB too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was meant to rely on "the standard library is special and can bend the rules", the comment should explicitly say so. The standard library can't just implicitly do this, the risk is too high that someone will copy this code and use it in their own crate.

However, even then it is better to avoid; usually we only do that for layout assumptions (Cc rust-lang/unsafe-code-guidelines#90). For other kinds of UB, it is typically considered a bug.

Wouldn't that make buffer_as_slice UB too?

Maybe? I am not familiar with the code I am afraid, I just read the safety comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looking at the code, buffer_as_slice should likely return a raw slice instead of a reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was meant to rely on "the standard library is special and can bend the rules", the comment should explicitly say so.

You're right. My mistake.

However, even then it is better to avoid; usually we only do that for layout assumptions

That makes sense. I was mostly using buffer_as_slice as a reference, but I agree that avoiding that assumption is better.

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's lots of old code here that was written before we had clear rules, and so does not follow today's rules. This case here is not a big deal because the discussion is still open whether we want this to be UB or not (that's rust-lang/unsafe-code-guidelines#77), but while we discuss it's better to follow the rules, even in the standard library, also to learn how hard/annoying it is to work with these rules.

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also the standard library currently does a poor job at documenting when it relies on extra assumptions, but I am pushing for new code that we add to do better. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all sounds great, :)

let buffer = unsafe { slice::from_raw_parts(self.ptr(), self.len()) };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack, but it's not too bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use buffer_as_slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used that method at first, but I decided to use from_raw_parts instead, since it's a hack either way. The issue is that I need a contiguous slice for check_range, so I use the first self.len() elements even if they're not initialized.

I could change it to unsafe { self.buffer_as_slice()[..self.len()] }, but that adds an unnecessary bounds check and looks more valid than it is.

I also added more of this explanation to the safety comment.

let Range { start, end } = buffer.check_range(range);
let tail = self.wrap_add(self.tail, start);
let head = self.wrap_add(self.tail, end);
(tail, head)
}

/// Creates an iterator that covers the specified range in the `VecDeque`.
Expand Down Expand Up @@ -1131,9 +1124,7 @@ impl<T> VecDeque<T> {
where
R: RangeBounds<usize>,
{
let (start, end) = self.range_start_end(range);
let tail = self.wrap_add(self.tail, start);
let head = self.wrap_add(self.tail, end);
let (tail, head) = self.range_tail_head(range);
Iter {
tail,
head,
Expand Down Expand Up @@ -1174,9 +1165,7 @@ impl<T> VecDeque<T> {
where
R: RangeBounds<usize>,
{
let (start, end) = self.range_start_end(range);
let tail = self.wrap_add(self.tail, start);
let head = self.wrap_add(self.tail, end);
let (tail, head) = self.range_tail_head(range);
IterMut {
tail,
head,
Expand Down Expand Up @@ -1230,7 +1219,7 @@ impl<T> VecDeque<T> {
// When finished, the remaining data will be copied back to cover the hole,
// and the head/tail values will be restored correctly.
//
let (start, end) = self.range_start_end(range);
let (drain_tail, drain_head) = self.range_tail_head(range);

// The deque's elements are parted into three segments:
// * self.tail -> drain_tail
Expand All @@ -1248,8 +1237,6 @@ impl<T> VecDeque<T> {
// T t h H
// [. . . o o x x o o . . .]
//
let drain_tail = self.wrap_add(self.tail, start);
let drain_head = self.wrap_add(self.tail, end);
let head = self.head;

// "forget" about the values after the start of the drain until after
Expand Down
1 change: 1 addition & 0 deletions library/alloc/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@
#![feature(rustc_attrs)]
#![feature(receiver_trait)]
#![feature(min_specialization)]
#![feature(slice_check_range)]
#![feature(slice_ptr_get)]
#![feature(slice_ptr_len)]
#![feature(staged_api)]
Expand Down
20 changes: 6 additions & 14 deletions library/alloc/src/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ use core::fmt;
use core::hash;
use core::iter::{FromIterator, FusedIterator};
use core::ops::Bound::{Excluded, Included, Unbounded};
use core::ops::{self, Add, AddAssign, Index, IndexMut, RangeBounds};
use core::ops::{self, Add, AddAssign, Index, IndexMut, Range, RangeBounds};
use core::ptr;
use core::str::{lossy, pattern::Pattern};

Expand Down Expand Up @@ -1506,23 +1506,15 @@ impl String {
// of the vector version. The data is just plain bytes.
// Because the range removal happens in Drop, if the Drain iterator is leaked,
// the removal will not happen.
let len = self.len();
let start = match range.start_bound() {
Included(&n) => n,
Excluded(&n) => n + 1,
Unbounded => 0,
};
let end = match range.end_bound() {
Included(&n) => n + 1,
Excluded(&n) => n,
Unbounded => len,
};
let Range { start, end } = self.as_bytes().check_range(range);
assert!(self.is_char_boundary(start));
assert!(self.is_char_boundary(end));

// Take out two simultaneous borrows. The &mut String won't be accessed
// until iteration is over, in Drop.
let self_ptr = self as *mut _;
// slicing does the appropriate bounds checks
let chars_iter = self[start..end].chars();
// SAFETY: `check_range` and `is_char_boundary` do the appropriate bounds checks.
let chars_iter = unsafe { self.get_unchecked(start..end) }.chars();

Drain { start, end, iter: chars_iter, string: self_ptr }
}
Expand Down
33 changes: 2 additions & 31 deletions library/alloc/src/vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,7 @@ use core::intrinsics::{arith_offset, assume};
use core::iter::{FromIterator, FusedIterator, TrustedLen};
use core::marker::PhantomData;
use core::mem::{self, ManuallyDrop, MaybeUninit};
use core::ops::Bound::{Excluded, Included, Unbounded};
use core::ops::{self, Index, IndexMut, RangeBounds};
use core::ops::{self, Index, IndexMut, Range, RangeBounds};
use core::ptr::{self, NonNull};
use core::slice::{self, SliceIndex};

Expand Down Expand Up @@ -1311,35 +1310,7 @@ impl<T> Vec<T> {
// the hole, and the vector length is restored to the new length.
//
let len = self.len();
let start = match range.start_bound() {
Included(&n) => n,
Excluded(&n) => n + 1,
Unbounded => 0,
};
let end = match range.end_bound() {
Included(&n) => n + 1,
Excluded(&n) => n,
Unbounded => len,
};

#[cold]
#[inline(never)]
fn start_assert_failed(start: usize, end: usize) -> ! {
panic!("start drain index (is {}) should be <= end drain index (is {})", start, end);
}

#[cold]
#[inline(never)]
fn end_assert_failed(end: usize, len: usize) -> ! {
panic!("end drain index (is {}) should be <= len (is {})", end, len);
}

if start > end {
start_assert_failed(start, end);
}
if end > len {
end_assert_failed(end, len);
}
let Range { start, end } = self.check_range(range);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this method was pretty finely tuned, do we have any existing benchmarks we can check to see whether there's any impact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Vec::drain has any benches. These are the benches for Vec:
https://github.com/rust-lang/rust/blob/master/library/alloc/benches/vec.rs

However, I wouldn't expect a regression. check_range calls functions with the same attributes for this reason. That's part of why I think it's better to package it with the standard library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there aren't any benches then I'd consider it a bit of a speculative implementation anyways, so am happy for this to be made simpler 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call here is the problem: this implicitly creates a slice covering the entire vector, which is a problem because that slice aliases with other pointers that existed before and that should remain valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a fix in #76662.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Wouldn't this use the safe Deref implementation for Vec? Even if not, why doesn't the mutable reference to self on this method prevent aliasing?

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prevents aliasing from safe clients, but Vec explicitly wants to support some unsafe clients that keep pointers into the buffer. That is okay as long as all the operations involved are guaranteed not to reallocate the buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here for the testcase that shows how unsafe code can rely on this property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I had no idea Vec makes those guarantees. Thanks for the explanation!

Copy link
Member

@RalfJung RalfJung Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these guarantees are documented here. I might have gone overboard in the test and check more than is guaranteed, but then it doesn't seem unlikely that people will take this guarantee as far as they can -- so I added a test basically for every operation that I could imagine preserving the buffer. Someone else double-checking this certainly would not hurt. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I read that passage but not thoroughly enough to notice these subtleties.

but then it doesn't seem unlikely that people will take this guarantee as far as they can

IIUC, there is still room for some of those calls to be removed. From the guarantees:

Bulk insertion methods may reallocate, even when not necessary.

Thus, append, extend, extend_from_slice, and sometimes splice shouldn't need to be restricted. Of course, keeping them in the test doesn't hurt. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... maybe we should add a comment though that this test does not constitute a guarantee. I'll prepare a PR.


unsafe {
// set self.vec length's to start, to be safe in case Drain is leaked
Expand Down
107 changes: 86 additions & 21 deletions library/core/src/slice/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ use crate::intrinsics::{assume, exact_div, is_aligned_and_not_null, unchecked_su
use crate::iter::*;
use crate::marker::{self, Copy, Send, Sized, Sync};
use crate::mem;
use crate::ops::{self, FnMut, Range};
use crate::ops::{self, Bound, FnMut, Range, RangeBounds};
use crate::option::Option;
use crate::option::Option::{None, Some};
use crate::ptr::{self, NonNull};
Expand Down Expand Up @@ -350,6 +350,79 @@ impl<T> [T] {
unsafe { &mut *index.get_unchecked_mut(self) }
}

/// Converts a range over this slice to [`Range`].
///
/// The returned range is safe to pass to [`get_unchecked`] and [`get_unchecked_mut`].
///
/// [`get_unchecked`]: #method.get_unchecked
/// [`get_unchecked_mut`]: #method.get_unchecked_mut
///
/// # Panics
///
/// Panics if the range is out of bounds.
///
/// # Examples
///
/// ```
/// #![feature(slice_check_range)]
///
/// let v = [10, 40, 30];
/// assert_eq!(1..2, v.check_range(1..2));
/// assert_eq!(0..2, v.check_range(..2));
/// assert_eq!(1..3, v.check_range(1..));
/// ```
///
/// Panics when [`Index::index`] would panic:
///
/// ```should_panic
/// #![feature(slice_check_range)]
///
/// [10, 40, 30].check_range(2..1);
/// ```
///
/// ```should_panic
/// #![feature(slice_check_range)]
///
/// [10, 40, 30].check_range(1..4);
/// ```
///
/// ```should_panic
/// #![feature(slice_check_range)]
///
/// [10, 40, 30].check_range(1..=usize::MAX);
/// ```
///
/// [`Index::index`]: ops::Index::index
#[track_caller]
#[unstable(feature = "slice_check_range", issue = "none")]
pub fn check_range<R: RangeBounds<usize>>(&self, range: R) -> Range<usize> {
Copy link

@leonardo-m leonardo-m Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't unfortunately return a CheckedRange<usize> because there's no association with a specific range, so it's useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more? I don't see how this would be useless, since it converts RangeBounds to usizes. It wasn't meant to reduce unsafe code.

It could return a CheckedRange that has a reference to the slice and a safe get method. However, that would require check_range_mut and CheckedRangeMut too, which would make this simple method much more complex.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code is useful. I meant to say that returning a CheckedRange is not so useful. As you say it could be done with the reference to the slice, but it could make it over engineered, so all is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense. Thanks for the clarification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we only need the length of the slice, I don't think this method should take a reference to the full slice. References are very strong types, they assert the aliasing rules and they assert that the memory they point to is initialized. As my other messages here show, that is a problem for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future readers, this conversation was continued here.

let start = match range.start_bound() {
Bound::Included(&start) => start,
Bound::Excluded(start) => {
start.checked_add(1).unwrap_or_else(|| slice_start_index_overflow_fail())
}
Bound::Unbounded => 0,
};

let len = self.len();
let end = match range.end_bound() {
Bound::Included(end) => {
end.checked_add(1).unwrap_or_else(|| slice_end_index_overflow_fail())
}
Bound::Excluded(&end) => end,
Bound::Unbounded => len,
};

if start > end {
slice_index_order_fail(start, end);
}
if end > len {
slice_end_index_len_fail(end, len);
}

Range { start, end }
}

/// Returns a raw pointer to the slice's buffer.
///
/// The caller must ensure that the slice outlives the pointer this
Expand Down Expand Up @@ -2438,26 +2511,11 @@ impl<T> [T] {
/// ```
#[stable(feature = "copy_within", since = "1.37.0")]
#[track_caller]
pub fn copy_within<R: ops::RangeBounds<usize>>(&mut self, src: R, dest: usize)
pub fn copy_within<R: RangeBounds<usize>>(&mut self, src: R, dest: usize)
where
T: Copy,
{
let src_start = match src.start_bound() {
ops::Bound::Included(&n) => n,
ops::Bound::Excluded(&n) => {
n.checked_add(1).unwrap_or_else(|| slice_index_overflow_fail())
}
ops::Bound::Unbounded => 0,
};
let src_end = match src.end_bound() {
ops::Bound::Included(&n) => {
n.checked_add(1).unwrap_or_else(|| slice_index_overflow_fail())
}
ops::Bound::Excluded(&n) => n,
ops::Bound::Unbounded => self.len(),
};
assert!(src_start <= src_end, "src end is before src start");
assert!(src_end <= self.len(), "src is out of bounds");
let Range { start: src_start, end: src_end } = self.check_range(src);
let count = src_end - src_start;
assert!(dest <= self.len() - count, "dest is out of bounds");
unsafe {
Expand Down Expand Up @@ -3034,7 +3092,14 @@ fn slice_index_order_fail(index: usize, end: usize) -> ! {
#[inline(never)]
#[cold]
#[track_caller]
fn slice_index_overflow_fail() -> ! {
fn slice_start_index_overflow_fail() -> ! {
panic!("attempted to index slice from after maximum usize");
}

#[inline(never)]
#[cold]
#[track_caller]
fn slice_end_index_overflow_fail() -> ! {
panic!("attempted to index slice up to maximum usize");
}

Expand Down Expand Up @@ -3370,15 +3435,15 @@ unsafe impl<T> SliceIndex<[T]> for ops::RangeInclusive<usize> {
#[inline]
fn index(self, slice: &[T]) -> &[T] {
if *self.end() == usize::MAX {
slice_index_overflow_fail();
slice_end_index_overflow_fail();
}
(*self.start()..self.end() + 1).index(slice)
}

#[inline]
fn index_mut(self, slice: &mut [T]) -> &mut [T] {
if *self.end() == usize::MAX {
slice_index_overflow_fail();
slice_end_index_overflow_fail();
}
(*self.start()..self.end() + 1).index_mut(slice)
}
Expand Down
4 changes: 2 additions & 2 deletions library/core/tests/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1797,7 +1797,7 @@ fn test_copy_within() {
}

#[test]
#[should_panic(expected = "src is out of bounds")]
#[should_panic(expected = "range end index 14 out of range for slice of length 13")]
fn test_copy_within_panics_src_too_long() {
let mut bytes = *b"Hello, World!";
// The length is only 13, so 14 is out of bounds.
Expand All @@ -1812,7 +1812,7 @@ fn test_copy_within_panics_dest_too_long() {
bytes.copy_within(0..4, 10);
}
#[test]
#[should_panic(expected = "src end is before src start")]
#[should_panic(expected = "slice index starts at 2 but ends at 1")]
fn test_copy_within_panics_src_inverted() {
let mut bytes = *b"Hello, World!";
// 2 is greater than 1, so this range is invalid.
Expand Down
1 change: 1 addition & 0 deletions src/tools/linkchecker/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ const LINKCHECK_EXCEPTIONS: &[(&str, &[&str])] = &[
"#method.sort_by_key",
"#method.make_ascii_uppercase",
"#method.make_ascii_lowercase",
"#method.get_unchecked_mut",
],
),
// These try to link to std::collections, but are defined in alloc
Expand Down